From a259d5320537576c0744238f01ca6e75ad776674 Mon Sep 17 00:00:00 2001 From: Michael Schmitz Date: Sat, 1 Feb 2014 13:48:13 +1300 Subject: [PATCH 001/455] m68k/atari - ide: do not register interrupt if host->get_lock is set On m68k, host->get_lock is used to both lock and register the interrupt that the IDE host shares with other device drivers. Registering the IDE interrupt handler in ide-probe.c results in duplicating the interrupt registered (once via host->get lock, and also via init_irq()), and may result in IDE accepting interrupts even when another driver has locked the interrupt hardware. This opens the whole locking scheme up to races. host->get_lock is set on m68k only, so other drivers' behaviour is not changed. Signed-off-by: Michael Schmitz Cc: Geert Uytterhoeven Cc: David S. Miller Cc: linux-ide@vger.kernel.org Signed-off-by: David S. Miller --- drivers/ide/ide-probe.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c index 2a744a91370e..a3d3b1733c49 100644 --- a/drivers/ide/ide-probe.c +++ b/drivers/ide/ide-probe.c @@ -853,8 +853,9 @@ static int init_irq (ide_hwif_t *hwif) if (irq_handler == NULL) irq_handler = ide_intr; - if (request_irq(hwif->irq, irq_handler, sa, hwif->name, hwif)) - goto out_up; + if (!host->get_lock) + if (request_irq(hwif->irq, irq_handler, sa, hwif->name, hwif)) + goto out_up; #if !defined(__mc68000__) printk(KERN_INFO "%s at 0x%03lx-0x%03lx,0x%03lx on irq %d", hwif->name, @@ -1533,7 +1534,8 @@ static void ide_unregister(ide_hwif_t *hwif) ide_proc_unregister_port(hwif); - free_irq(hwif->irq, hwif); + if (!hwif->host->get_lock) + free_irq(hwif->irq, hwif); device_unregister(hwif->portdev); device_unregister(&hwif->gendev); From 2f2d4dd63d4e0db6d3a9a246624a7ea335957e98 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Tue, 11 Mar 2014 12:44:54 +0100 Subject: [PATCH 002/455] ide: Fix CS5520 and CS5530 dependencies As far as I know, the CS5520 and CS5530 chipsets were only used with 32-bit x86 Geode processors, so I think their drivers are only needed on this architecture, except for build testing purpose. While we're here, simplify the dependencies for the CS5535 driver. Signed-off-by: Jean Delvare Cc: "David S. Miller" Signed-off-by: David S. Miller --- drivers/ide/Kconfig | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/ide/Kconfig b/drivers/ide/Kconfig index 8fb46aab2d87..1bbf48ea73ff 100644 --- a/drivers/ide/Kconfig +++ b/drivers/ide/Kconfig @@ -416,6 +416,7 @@ config BLK_DEV_CY82C693 config BLK_DEV_CS5520 tristate "Cyrix CS5510/20 MediaGX chipset support (VERY EXPERIMENTAL)" + depends on X86_32 || COMPILE_TEST select BLK_DEV_IDEDMA_PCI help Include support for PIO tuning and virtual DMA on the Cyrix MediaGX @@ -426,6 +427,7 @@ config BLK_DEV_CS5520 config BLK_DEV_CS5530 tristate "Cyrix/National Semiconductor CS5530 MediaGX chipset support" + depends on X86_32 || COMPILE_TEST select BLK_DEV_IDEDMA_PCI help Include support for UDMA on the Cyrix MediaGX 5530 chipset. This @@ -435,7 +437,7 @@ config BLK_DEV_CS5530 config BLK_DEV_CS5535 tristate "AMD CS5535 chipset support" - depends on X86 && !X86_64 + depends on X86_32 select BLK_DEV_IDEDMA_PCI help Include support for UDMA on the NSC/AMD CS5535 companion chipset. From 5b40dd30bbfaa7fcba0cd945a4852a146c552ea7 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Fri, 14 Mar 2014 17:54:31 +0100 Subject: [PATCH 003/455] ide: Fix SC1200 dependencies The SC1200 is a SoC based on the Geode GX1 32-bit x86 processor, so its drivers are only needed on this architecture, except for build testing purpose. Signed-off-by: Jean Delvare Cc: "David S. Miller" Signed-off-by: David S. Miller --- drivers/ide/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ide/Kconfig b/drivers/ide/Kconfig index 1bbf48ea73ff..a04c49f2a011 100644 --- a/drivers/ide/Kconfig +++ b/drivers/ide/Kconfig @@ -488,6 +488,7 @@ config BLK_DEV_JMICRON config BLK_DEV_SC1200 tristate "National SCx200 chipset support" + depends on X86_32 || COMPILE_TEST select BLK_DEV_IDEDMA_PCI help This driver adds support for the on-board IDE controller on the From 694617474e33b8603fc76e090ed7d09376514b1a Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Tue, 4 Mar 2014 17:13:47 -0500 Subject: [PATCH 004/455] slab_common: fix the check for duplicate slab names The patch 3e374919b314f20e2a04f641ebc1093d758f66a4 is supposed to fix the problem where kmem_cache_create incorrectly reports duplicate cache name and fails. The problem is described in the header of that patch. However, the patch doesn't really fix the problem because of these reasons: * the logic to test for debugging is reversed. It was intended to perform the check only if slub debugging is enabled (which implies that caches with the same parameters are not merged). Therefore, there should be #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_DEBUG_ON) The current code has the condition reversed and performs the test if debugging is disabled. * slub debugging may be enabled or disabled based on kernel command line, CONFIG_SLUB_DEBUG_ON is just the default settings. Therefore the test based on definition of CONFIG_SLUB_DEBUG_ON is unreliable. This patch fixes the problem by removing the test "!defined(CONFIG_SLUB_DEBUG_ON)". Therefore, duplicate names are never checked if the SLUB allocator is used. Note to stable kernel maintainers: when backporint this patch, please backport also the patch 3e374919b314f20e2a04f641ebc1093d758f66a4. Acked-by: David Rientjes Acked-by: Christoph Lameter Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org # 3.6+ Signed-off-by: Pekka Enberg --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 102cc6fca3d3..b810fba0095d 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -55,7 +55,7 @@ static int kmem_cache_sanity_check(const char *name, size_t size) continue; } -#if !defined(CONFIG_SLUB) || !defined(CONFIG_SLUB_DEBUG_ON) +#if !defined(CONFIG_SLUB) if (!strcmp(s->name, name)) { pr_err("%s (%s): Cache name already exists.\n", __func__, name); From f5a9f0ca40c74547e28e0cb30973df5577dfbaec Mon Sep 17 00:00:00 2001 From: Michal Kazior Date: Fri, 30 May 2014 23:49:57 +0300 Subject: [PATCH 005/455] ath10k: remove unnecessary htt rx corruption check While fixing a bug reported by Avery I went ahead and added a warning suspecting there might be something more to the bug. This ended up with people reporting they see warnings during heavy traffic. This bought me some time and helped me understand the problem better - apparently fw/hw can report a chained msdus as follows: 1 msdu, 1 chained, 1 msdu (0 length). The patch removes the extra check but leaves the other change that fixed the original skb_push panic bug (msdu_chaining was overwritten in an unfortunate way which made the above example to be treated as non-chained case). Reported-by: Yeoh Chun-Yeow Reported-by: Tim Harvey Signed-off-by: Michal Kazior Signed-off-by: Kalle Valo --- drivers/net/wireless/ath/ath10k/htt_rx.c | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index 6c102b1312ff..eebc860c3655 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -312,7 +312,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt, int msdu_len, msdu_chaining = 0; struct sk_buff *msdu; struct htt_rx_desc *rx_desc; - bool corrupted = false; lockdep_assert_held(&htt->rx_ring.lock); @@ -439,9 +438,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt, last_msdu = __le32_to_cpu(rx_desc->msdu_end.info0) & RX_MSDU_END_INFO0_LAST_MSDU; - if (msdu_chaining && !last_msdu) - corrupted = true; - if (last_msdu) { msdu->next = NULL; break; @@ -456,20 +452,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt, if (*head_msdu == NULL) msdu_chaining = -1; - /* - * Apparently FW sometimes reports weird chained MSDU sequences with - * more than one rx descriptor. This seems like a bug but needs more - * analyzing. For the time being fix it by dropping such sequences to - * avoid blowing up the host system. - */ - if (corrupted) { - ath10k_warn("failed to pop chained msdus, dropping\n"); - ath10k_htt_rx_free_msdu_chain(*head_msdu); - *head_msdu = NULL; - *tail_msdu = NULL; - msdu_chaining = -EINVAL; - } - /* * Don't refill the ring yet. * From dfa413de1e4388818f7dcdce0a90d6212e74895b Mon Sep 17 00:00:00 2001 From: Bartosz Markowski Date: Mon, 2 Jun 2014 21:19:45 +0300 Subject: [PATCH 006/455] ath10k: fix 8th virtual AP interface with DFS Firmware 10.x supports up to 8 virtual AP interfaces, but in a DFS channel it was possible to create only 7 interfaces as ath10k internal creates a monitor interface for DFS. Previous vdev map initialization was missing enough space for 8 + 1 vdevs due to wrong define used and that's why there was no space for 8th interface. Use the correct define TARGET_10X_NUM_VDEVS with 10.x firmware to make it possible to create the 8th virtual interface. Signed-off-by: Bartosz Markowski Signed-off-by: Kalle Valo --- drivers/net/wireless/ath/ath10k/core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 82017f56e661..e6c56c5bb0f6 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -795,7 +795,11 @@ int ath10k_core_start(struct ath10k *ar) if (status) goto err_htc_stop; - ar->free_vdev_map = (1 << TARGET_NUM_VDEVS) - 1; + if (test_bit(ATH10K_FW_FEATURE_WMI_10X, ar->fw_features)) + ar->free_vdev_map = (1 << TARGET_10X_NUM_VDEVS) - 1; + else + ar->free_vdev_map = (1 << TARGET_NUM_VDEVS) - 1; + INIT_LIST_HEAD(&ar->arvifs); if (!test_bit(ATH10K_FLAG_FIRST_BOOT_DONE, &ar->dev_flags)) From 17290231df16eeee5dfc198dbf5ee4b419996dcd Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Sat, 24 May 2014 21:48:28 +0400 Subject: [PATCH 007/455] xtensa: add fixup for double exception raised in window overflow There are two FIXMEs in the double exception handler 'for the extremely unlikely case'. This case gets hit by gcc during kernel build once in a few hours, resulting in an unrecoverable exception condition. Provide missing fixup routine to handle this case. Double exception literals now need 8 more bytes, add them to the linker script. Also replace bbsi instructions with bbsi.l as we're branching depending on 8th and 7th LSB-based bits of exception address. This may be tested by adding the explicit DTLB invalidation to window overflow handlers, like the following: --- a/arch/xtensa/kernel/vectors.S +++ b/arch/xtensa/kernel/vectors.S @@ -592,6 +592,14 @@ ENDPROC(_WindowUnderflow4) ENTRY_ALIGN64(_WindowOverflow8) s32e a0, a9, -16 + bbsi.l a9, 31, 1f + rsr a0, ccount + bbsi.l a0, 4, 1f + pdtlb a0, a9 + idtlb a0 + movi a0, 9 + idtlb a0 +1: l32e a0, a1, -12 s32e a2, a9, -8 s32e a1, a9, -12 Cc: stable@vger.kernel.org Signed-off-by: Max Filippov --- arch/xtensa/kernel/vectors.S | 158 ++++++++++++++++++++++++++----- arch/xtensa/kernel/vmlinux.lds.S | 4 +- 2 files changed, 138 insertions(+), 24 deletions(-) diff --git a/arch/xtensa/kernel/vectors.S b/arch/xtensa/kernel/vectors.S index f9e1ec346e35..8453e6e39895 100644 --- a/arch/xtensa/kernel/vectors.S +++ b/arch/xtensa/kernel/vectors.S @@ -376,38 +376,42 @@ _DoubleExceptionVector_WindowOverflow: beqz a2, 1f # if at start of vector, don't restore addi a0, a0, -128 - bbsi a0, 8, 1f # don't restore except for overflow 8 and 12 - bbsi a0, 7, 2f + bbsi.l a0, 8, 1f # don't restore except for overflow 8 and 12 + + /* + * This fixup handler is for the extremely unlikely case where the + * overflow handler's reference thru a0 gets a hardware TLB refill + * that bumps out the (distinct, aliasing) TLB entry that mapped its + * prior references thru a9/a13, and where our reference now thru + * a9/a13 gets a 2nd-level miss exception (not hardware TLB refill). + */ + movi a2, window_overflow_restore_a0_fixup + s32i a2, a3, EXC_TABLE_FIXUP + l32i a2, a3, EXC_TABLE_DOUBLE_SAVE + xsr a3, excsave1 + + bbsi.l a0, 7, 2f /* * Restore a0 as saved by _WindowOverflow8(). - * - * FIXME: we really need a fixup handler for this L32E, - * for the extremely unlikely case where the overflow handler's - * reference thru a0 gets a hardware TLB refill that bumps out - * the (distinct, aliasing) TLB entry that mapped its prior - * references thru a9, and where our reference now thru a9 - * gets a 2nd-level miss exception (not hardware TLB refill). */ - l32e a2, a9, -16 - wsr a2, depc # replace the saved a0 - j 1f + l32e a0, a9, -16 + wsr a0, depc # replace the saved a0 + j 3f 2: /* * Restore a0 as saved by _WindowOverflow12(). - * - * FIXME: we really need a fixup handler for this L32E, - * for the extremely unlikely case where the overflow handler's - * reference thru a0 gets a hardware TLB refill that bumps out - * the (distinct, aliasing) TLB entry that mapped its prior - * references thru a13, and where our reference now thru a13 - * gets a 2nd-level miss exception (not hardware TLB refill). */ - l32e a2, a13, -16 - wsr a2, depc # replace the saved a0 + l32e a0, a13, -16 + wsr a0, depc # replace the saved a0 +3: + xsr a3, excsave1 + movi a0, 0 + s32i a0, a3, EXC_TABLE_FIXUP + s32i a2, a3, EXC_TABLE_DOUBLE_SAVE 1: /* * Restore WindowBase while leaving all address registers restored. @@ -449,6 +453,7 @@ _DoubleExceptionVector_WindowOverflow: s32i a0, a2, PT_DEPC +_DoubleExceptionVector_handle_exception: addx4 a0, a0, a3 l32i a0, a0, EXC_TABLE_FAST_USER xsr a3, excsave1 @@ -464,10 +469,119 @@ _DoubleExceptionVector_WindowOverflow: rotw -3 j 1b - .end literal_prefix ENDPROC(_DoubleExceptionVector) +/* + * Fixup handler for TLB miss in double exception handler for window owerflow. + * We get here with windowbase set to the window that was being spilled and + * a0 trashed. a0 bit 7 determines if this is a call8 (bit clear) or call12 + * (bit set) window. + * + * We do the following here: + * - go to the original window retaining a0 value; + * - set up exception stack to return back to appropriate a0 restore code + * (we'll need to rotate window back and there's no place to save this + * information, use different return address for that); + * - handle the exception; + * - go to the window that was being spilled; + * - set up window_overflow_restore_a0_fixup as a fixup routine; + * - reload a0; + * - restore the original window; + * - reset the default fixup routine; + * - return to user. By the time we get to this fixup handler all information + * about the conditions of the original double exception that happened in + * the window overflow handler is lost, so we just return to userspace to + * retry overflow from start. + * + * a0: value of depc, original value in depc + * a2: trashed, original value in EXC_TABLE_DOUBLE_SAVE + * a3: exctable, original value in excsave1 + */ + +ENTRY(window_overflow_restore_a0_fixup) + + rsr a0, ps + extui a0, a0, PS_OWB_SHIFT, PS_OWB_WIDTH + rsr a2, windowbase + sub a0, a2, a0 + extui a0, a0, 0, 3 + l32i a2, a3, EXC_TABLE_DOUBLE_SAVE + xsr a3, excsave1 + + _beqi a0, 1, .Lhandle_1 + _beqi a0, 3, .Lhandle_3 + + .macro overflow_fixup_handle_exception_pane n + + rsr a0, depc + rotw -\n + + xsr a3, excsave1 + wsr a2, depc + l32i a2, a3, EXC_TABLE_KSTK + s32i a0, a2, PT_AREG0 + + movi a0, .Lrestore_\n + s32i a0, a2, PT_DEPC + rsr a0, exccause + j _DoubleExceptionVector_handle_exception + + .endm + + overflow_fixup_handle_exception_pane 2 +.Lhandle_1: + overflow_fixup_handle_exception_pane 1 +.Lhandle_3: + overflow_fixup_handle_exception_pane 3 + + .macro overflow_fixup_restore_a0_pane n + + rotw \n + /* Need to preserve a0 value here to be able to handle exception + * that may occur on a0 reload from stack. It may occur because + * TLB miss handler may not be atomic and pointer to page table + * may be lost before we get here. There are no free registers, + * so we need to use EXC_TABLE_DOUBLE_SAVE area. + */ + xsr a3, excsave1 + s32i a2, a3, EXC_TABLE_DOUBLE_SAVE + movi a2, window_overflow_restore_a0_fixup + s32i a2, a3, EXC_TABLE_FIXUP + l32i a2, a3, EXC_TABLE_DOUBLE_SAVE + xsr a3, excsave1 + bbsi.l a0, 7, 1f + l32e a0, a9, -16 + j 2f +1: + l32e a0, a13, -16 +2: + rotw -\n + + .endm + +.Lrestore_2: + overflow_fixup_restore_a0_pane 2 + +.Lset_default_fixup: + xsr a3, excsave1 + s32i a2, a3, EXC_TABLE_DOUBLE_SAVE + movi a2, 0 + s32i a2, a3, EXC_TABLE_FIXUP + l32i a2, a3, EXC_TABLE_DOUBLE_SAVE + xsr a3, excsave1 + rfe + +.Lrestore_1: + overflow_fixup_restore_a0_pane 1 + j .Lset_default_fixup +.Lrestore_3: + overflow_fixup_restore_a0_pane 3 + j .Lset_default_fixup + +ENDPROC(window_overflow_restore_a0_fixup) + + .end literal_prefix /* * Debug interrupt vector * diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S index ee32c0085dff..d16db6df86f8 100644 --- a/arch/xtensa/kernel/vmlinux.lds.S +++ b/arch/xtensa/kernel/vmlinux.lds.S @@ -269,13 +269,13 @@ SECTIONS .UserExceptionVector.literal) SECTION_VECTOR (_DoubleExceptionVector_literal, .DoubleExceptionVector.literal, - DOUBLEEXC_VECTOR_VADDR - 16, + DOUBLEEXC_VECTOR_VADDR - 40, SIZEOF(.UserExceptionVector.text), .UserExceptionVector.text) SECTION_VECTOR (_DoubleExceptionVector_text, .DoubleExceptionVector.text, DOUBLEEXC_VECTOR_VADDR, - 32, + 40, .DoubleExceptionVector.literal) . = (LOADADDR( .DoubleExceptionVector.text ) + SIZEOF( .DoubleExceptionVector.text ) + 3) & ~ 3; From be6ae382dc153da51cf066c8dd523aa955f02531 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Mon, 9 Jun 2014 22:18:24 +0400 Subject: [PATCH 008/455] xtensa: fix sysmem reservation at the end of existing block When sysmem reservation occurs exactly at the end of an existing block that block is deleted, because it is incorrectly included in the range of memblocks to remove. Fix that by skipping such block. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov --- arch/xtensa/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c index 4224256bb215..77ed20209ca5 100644 --- a/arch/xtensa/mm/init.c +++ b/arch/xtensa/mm/init.c @@ -191,7 +191,7 @@ int __init mem_reserve(unsigned long start, unsigned long end, int must_exist) return -EINVAL; } - if (it && start - it->start < bank_sz) { + if (it && start - it->start <= bank_sz) { if (start == it->start) { if (end - it->start < bank_sz) { it->start = end; From d7da3a3ccdeb64ceedb51b0a3377ba56cc2999fa Mon Sep 17 00:00:00 2001 From: Ping Cheng Date: Fri, 13 Jun 2014 13:37:33 -0700 Subject: [PATCH 009/455] Input: wacom - cleanup multitouch code when touch_max is 2 Historically we dealt with touch_max equals to 2 differently from other MT devices. Now we use input_mt_*() to process all MT events, as long as touch_max is greater than 1. So, there is no need to take (touch_max == 2) as a special case any more. Signed-off-by: Ping Cheng Reviewed-by: Jason Gerecke Signed-off-by: Dmitry Torokhov --- drivers/input/tablet/wacom_wac.c | 28 +++++++--------------------- 1 file changed, 7 insertions(+), 21 deletions(-) diff --git a/drivers/input/tablet/wacom_wac.c b/drivers/input/tablet/wacom_wac.c index 977d05cd9e2e..e73cf2c71f35 100644 --- a/drivers/input/tablet/wacom_wac.c +++ b/drivers/input/tablet/wacom_wac.c @@ -1217,9 +1217,9 @@ static void wacom_bpt3_touch_msg(struct wacom_wac *wacom, unsigned char *data) * a=(pi*r^2)/C. */ int a = data[5]; - int x_res = input_abs_get_res(input, ABS_X); - int y_res = input_abs_get_res(input, ABS_Y); - width = 2 * int_sqrt(a * WACOM_CONTACT_AREA_SCALE); + int x_res = input_abs_get_res(input, ABS_MT_POSITION_X); + int y_res = input_abs_get_res(input, ABS_MT_POSITION_Y); + width = 2 * int_sqrt(a * WACOM_CONTACT_AREA_SCALE); height = width * y_res / x_res; } @@ -1587,7 +1587,7 @@ static void wacom_abs_set_axis(struct input_dev *input_dev, input_abs_set_res(input_dev, ABS_X, features->x_resolution); input_abs_set_res(input_dev, ABS_Y, features->y_resolution); } else { - if (features->touch_max <= 2) { + if (features->touch_max == 1) { input_set_abs_params(input_dev, ABS_X, 0, features->x_max, features->x_fuzz, 0); input_set_abs_params(input_dev, ABS_Y, 0, @@ -1815,14 +1815,8 @@ int wacom_setup_input_capabilities(struct input_dev *input_dev, case MTTPC: case MTTPC_B: case TABLETPC2FG: - if (features->device_type == BTN_TOOL_FINGER) { - unsigned int flags = INPUT_MT_DIRECT; - - if (wacom_wac->features.type == TABLETPC2FG) - flags = 0; - - input_mt_init_slots(input_dev, features->touch_max, flags); - } + if (features->device_type == BTN_TOOL_FINGER && features->touch_max > 1) + input_mt_init_slots(input_dev, features->touch_max, INPUT_MT_DIRECT); /* fall through */ case TABLETPC: @@ -1883,10 +1877,6 @@ int wacom_setup_input_capabilities(struct input_dev *input_dev, __set_bit(BTN_RIGHT, input_dev->keybit); if (features->touch_max) { - /* touch interface */ - unsigned int flags = INPUT_MT_POINTER; - - __set_bit(INPUT_PROP_POINTER, input_dev->propbit); if (features->pktlen == WACOM_PKGLEN_BBTOUCH3) { input_set_abs_params(input_dev, ABS_MT_TOUCH_MAJOR, @@ -1894,12 +1884,8 @@ int wacom_setup_input_capabilities(struct input_dev *input_dev, input_set_abs_params(input_dev, ABS_MT_TOUCH_MINOR, 0, features->y_max, 0, 0); - } else { - __set_bit(BTN_TOOL_FINGER, input_dev->keybit); - __set_bit(BTN_TOOL_DOUBLETAP, input_dev->keybit); - flags = 0; } - input_mt_init_slots(input_dev, features->touch_max, flags); + input_mt_init_slots(input_dev, features->touch_max, INPUT_MT_POINTER); } else { /* buttons/keys only interface */ __clear_bit(ABS_X, input_dev->absbit); From 31972f6e517d82a4f60de4994908724b7b47e337 Mon Sep 17 00:00:00 2001 From: Felipe Balbi Date: Sun, 15 Jun 2014 00:15:09 -0700 Subject: [PATCH 010/455] Input: ti_am335x_tsc - warn about incorrect spelling In the hopes that people run new kernels on their devices, let's add a warning message asking users to have their DTS file fixed. The goal is that by Linux 4.0 we will be able to remove support for the bogus version of our touchscreen's DTS. Signed-off-by: Felipe Balbi Acked-by: Mark Rutland Signed-off-by: Dmitry Torokhov --- drivers/input/touchscreen/ti_am335x_tsc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c b/drivers/input/touchscreen/ti_am335x_tsc.c index 4e793a17361f..2ce649520fe0 100644 --- a/drivers/input/touchscreen/ti_am335x_tsc.c +++ b/drivers/input/touchscreen/ti_am335x_tsc.c @@ -359,9 +359,12 @@ static int titsc_parse_dt(struct platform_device *pdev, */ err = of_property_read_u32(node, "ti,coordinate-readouts", &ts_dev->coordinate_readouts); - if (err < 0) + if (err < 0) { + dev_warn(&pdev->dev, "please use 'ti,coordinate-readouts' instead\n"); err = of_property_read_u32(node, "ti,coordiante-readouts", &ts_dev->coordinate_readouts); + } + if (err < 0) return err; From 4856fbd12d69965d3ab680c686222db93872728d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 11 Jun 2014 11:49:31 -0300 Subject: [PATCH 011/455] [media] staging: tighten omap4iss dependencies The OMAP4 camera support depends on I2C and VIDEO_V4L2, both of which can be loadable modules. This causes build failures if we want the camera driver to be built-in. This can be solved by turning the option into "tristate", which unfortunately causes another problem, because the driver incorrectly calls a platform-internal interface for omap4_ctrl_pad_readl/omap4_ctrl_pad_writel. Instead, this patch just forbids the invalid configurations and ensures that the driver can only be built if all its dependencies are built-in. Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab --- drivers/staging/media/omap4iss/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/omap4iss/Kconfig b/drivers/staging/media/omap4iss/Kconfig index 78b0fba7047e..8afc6fee40c5 100644 --- a/drivers/staging/media/omap4iss/Kconfig +++ b/drivers/staging/media/omap4iss/Kconfig @@ -1,6 +1,6 @@ config VIDEO_OMAP4 bool "OMAP 4 Camera support" - depends on VIDEO_V4L2 && VIDEO_V4L2_SUBDEV_API && I2C && ARCH_OMAP4 + depends on VIDEO_V4L2=y && VIDEO_V4L2_SUBDEV_API && I2C=y && ARCH_OMAP4 select VIDEOBUF2_DMA_CONTIG ---help--- Driver for an OMAP 4 ISS controller. From eefae30a1b3aabab6085be2ca0e314021253daa2 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Fri, 13 Jun 2014 11:08:25 -0300 Subject: [PATCH 012/455] [media] si2168: add one missing parenthesis Fix following warnings: si2168_cmd_execute() warn: add some parenthesis here? si2168_cmd_execute() warn: maybe use && instead of & Reported-by: Dan Carpenter Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/si2168.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c index 8637d2ed7623..f20573649043 100644 --- a/drivers/media/dvb-frontends/si2168.c +++ b/drivers/media/dvb-frontends/si2168.c @@ -60,7 +60,7 @@ static int si2168_cmd_execute(struct si2168 *s, struct si2168_cmd *cmd) jiffies_to_msecs(jiffies) - (jiffies_to_msecs(timeout) - TIMEOUT)); - if (!(cmd->args[0] >> 7) & 0x01) { + if (!((cmd->args[0] >> 7) & 0x01)) { ret = -ETIMEDOUT; goto err_mutex_unlock; } From a811e6ec87d910faceda561fae9b0088d70ee831 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Fri, 13 Jun 2014 11:19:07 -0300 Subject: [PATCH 013/455] [media] si2157: add one missing parenthesis Fix following warnings: si2157_cmd_execute() warn: add some parenthesis here? si2157_cmd_execute() warn: maybe use && instead of & Reported-by: Dan Carpenter Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/tuners/si2157.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/tuners/si2157.c b/drivers/media/tuners/si2157.c index 271a752cee54..fa4cc7b880aa 100644 --- a/drivers/media/tuners/si2157.c +++ b/drivers/media/tuners/si2157.c @@ -57,7 +57,7 @@ static int si2157_cmd_execute(struct si2157 *s, struct si2157_cmd *cmd) jiffies_to_msecs(jiffies) - (jiffies_to_msecs(timeout) - TIMEOUT)); - if (!(buf[0] >> 7) & 0x01) { + if (!((buf[0] >> 7) & 0x01)) { ret = -ETIMEDOUT; goto err_mutex_unlock; } else { From 0c76e68d6ec6ade4dd0ae15fb08a827525fec3a2 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Fri, 13 Jun 2014 19:29:55 -0300 Subject: [PATCH 014/455] [media] si2168: firmware download fix First 8 bytes belonging to firmware image were hard-coded and uploaded by the driver mistakenly. Introduce new corrected firmware file and remove those 8 bytes from the driver. New firmware image could be extracted from the PCTV 292e driver CD using following command: $ dd if=/TVC 6.4.8/Driver/PCTV Empia/emOEM.sys ibs=1 skip=1089408 count=2728 of=dvb-demod-si2168-02.fw $ md5sum dvb-demod-si2168-02.fw d8da7ff67cd56cd8aa4e101aea45e052 dvb-demod-si2168-02.fw $ sudo cp dvb-demod-si2168-02.fw /lib/firmware/ Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/si2168.c | 14 -------------- drivers/media/dvb-frontends/si2168_priv.h | 2 +- 2 files changed, 1 insertion(+), 15 deletions(-) diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c index f20573649043..2e3cdcfa0a67 100644 --- a/drivers/media/dvb-frontends/si2168.c +++ b/drivers/media/dvb-frontends/si2168.c @@ -485,20 +485,6 @@ static int si2168_init(struct dvb_frontend *fe) if (ret) goto err; - cmd.args[0] = 0x05; - cmd.args[1] = 0x00; - cmd.args[2] = 0xaa; - cmd.args[3] = 0x4d; - cmd.args[4] = 0x56; - cmd.args[5] = 0x40; - cmd.args[6] = 0x00; - cmd.args[7] = 0x00; - cmd.wlen = 8; - cmd.rlen = 1; - ret = si2168_cmd_execute(s, &cmd); - if (ret) - goto err; - /* cold state - try to download firmware */ dev_info(&s->client->dev, "%s: found a '%s' in cold state\n", KBUILD_MODNAME, si2168_ops.info.name); diff --git a/drivers/media/dvb-frontends/si2168_priv.h b/drivers/media/dvb-frontends/si2168_priv.h index 2a343e896f40..53f7f06ae343 100644 --- a/drivers/media/dvb-frontends/si2168_priv.h +++ b/drivers/media/dvb-frontends/si2168_priv.h @@ -22,7 +22,7 @@ #include #include -#define SI2168_FIRMWARE "dvb-demod-si2168-01.fw" +#define SI2168_FIRMWARE "dvb-demod-si2168-02.fw" /* state struct */ struct si2168 { From f71920efb1066d71d74811e1dbed658173adf9bf Mon Sep 17 00:00:00 2001 From: Rickard Strandqvist Date: Sat, 14 Jun 2014 08:37:09 -0300 Subject: [PATCH 015/455] [media] media: v4l2-core: v4l2-dv-timings.c: Cleaning up code wrong value used in aspect ratio Wrong value used in same cases for the aspect ratio. Signed-off-by: Rickard Strandqvist Acked-by: Lad, Prabhakar Cc: stable@vger.kernel.org # for v3.12 and up Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- drivers/media/v4l2-core/v4l2-dv-timings.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/media/v4l2-core/v4l2-dv-timings.c b/drivers/media/v4l2-core/v4l2-dv-timings.c index 4ae54caadd03..ce1c9f5d9dee 100644 --- a/drivers/media/v4l2-core/v4l2-dv-timings.c +++ b/drivers/media/v4l2-core/v4l2-dv-timings.c @@ -610,10 +610,10 @@ struct v4l2_fract v4l2_calc_aspect_ratio(u8 hor_landscape, u8 vert_portrait) aspect.denominator = 9; } else if (ratio == 34) { aspect.numerator = 4; - aspect.numerator = 3; + aspect.denominator = 3; } else if (ratio == 68) { aspect.numerator = 15; - aspect.numerator = 9; + aspect.denominator = 9; } else { aspect.numerator = hor_landscape + 99; aspect.denominator = 100; From 13936af3d2f04f173a83cc050dbc4b20d8562b81 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Mon, 16 Jun 2014 04:32:37 -0300 Subject: [PATCH 016/455] [media] saa7134: use unlocked_ioctl instead of ioctl The saa7134 driver uses core-locking, so there is no longer any need to use the ioctl op instead of the unlocked_ioctl op. This change was forgotten for the saa7134-empress.c, so fix this. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- drivers/media/pci/saa7134/saa7134-empress.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/pci/saa7134/saa7134-empress.c b/drivers/media/pci/saa7134/saa7134-empress.c index e65c760e4e8b..0006d6bf8c18 100644 --- a/drivers/media/pci/saa7134/saa7134-empress.c +++ b/drivers/media/pci/saa7134/saa7134-empress.c @@ -179,7 +179,7 @@ static const struct v4l2_file_operations ts_fops = .read = vb2_fop_read, .poll = vb2_fop_poll, .mmap = vb2_fop_mmap, - .ioctl = video_ioctl2, + .unlocked_ioctl = video_ioctl2, }; static const struct v4l2_ioctl_ops ts_ioctl_ops = { From d755330c5e0658d8056242b5b81e2f44ed7a96d8 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Sun, 15 Jun 2014 10:22:15 +0200 Subject: [PATCH 017/455] perf tools: Fix segfault in cumulative.callchain report When cumulative callchain mode is on, we could get samples with with no actual hits. This breaks the assumption of the annotation code, that each sample has annotation counts allocated and leads to segfault. Fixing this by additional checks for annotation stats. Acked-by: Namhyung Kim Acked-by: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Corey Ashford Cc: David Ahern Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Namhyung Kim Cc: Paul Mackerras Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1402821332-12419-1-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa --- tools/perf/ui/browsers/hists.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c index 52c03fbbba17..04a229aa5c0f 100644 --- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -17,6 +17,7 @@ #include "../util.h" #include "../ui.h" #include "map.h" +#include "annotate.h" struct hist_browser { struct ui_browser b; @@ -1593,13 +1594,18 @@ static int perf_evsel__hists_browse(struct perf_evsel *evsel, int nr_events, bi->to.sym->name) > 0) annotate_t = nr_options++; } else { - if (browser->selection != NULL && browser->selection->sym != NULL && - !browser->selection->map->dso->annotate_warned && - asprintf(&options[nr_options], "Annotate %s", - browser->selection->sym->name) > 0) - annotate = nr_options++; + !browser->selection->map->dso->annotate_warned) { + struct annotation *notes; + + notes = symbol__annotation(browser->selection->sym); + + if (notes->src && + asprintf(&options[nr_options], "Annotate %s", + browser->selection->sym->name) > 0) + annotate = nr_options++; + } } if (thread != NULL && @@ -1656,6 +1662,7 @@ retry_popup_menu: if (choice == annotate || choice == annotate_t || choice == annotate_f) { struct hist_entry *he; + struct annotation *notes; int err; do_annotate: if (!objdump_path && perf_session_env__lookup_objdump(env)) @@ -1679,6 +1686,10 @@ do_annotate: he->ms.map = he->branch_info->to.map; } + notes = symbol__annotation(he->ms.sym); + if (!notes->src) + continue; + /* * Don't let this be freed, say, by hists__decay_entry. */ From a93f0e551af9e194db38bfe16001e17a3a1d189a Mon Sep 17 00:00:00 2001 From: Simon Que Date: Mon, 16 Jun 2014 11:32:09 -0700 Subject: [PATCH 018/455] perf symbols: Get kernel start address by symbol name The function machine__get_kernel_start_addr() was taking the first symbol of kallsyms as the start address. This is incorrect in certain cases where the first symbol is something at 0, while the actual kernel functions begin at a later point (e.g. 0x80200000). This patch fixes machine__get_kernel_start_addr() to search for the symbol "_text" or "_stext", which marks the beginning of kernel mapping. This was already being done in machine__create_kernel_maps(). Thus, this patch is just a refactor, to move that code into machine__get_kernel_start_addr(). Signed-off-by: Simon Que Link: http://lkml.kernel.org/r/1402943529-13244-1-git-send-email-sque@chromium.org Signed-off-by: Jiri Olsa --- tools/perf/util/machine.c | 54 ++++++++++++++++----------------------- 1 file changed, 22 insertions(+), 32 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 0e5fea95d596..c73e1fc12e53 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -496,18 +496,6 @@ struct process_args { u64 start; }; -static int symbol__in_kernel(void *arg, const char *name, - char type __maybe_unused, u64 start) -{ - struct process_args *args = arg; - - if (strchr(name, '[')) - return 0; - - args->start = start; - return 1; -} - static void machine__get_kallsyms_filename(struct machine *machine, char *buf, size_t bufsz) { @@ -517,27 +505,41 @@ static void machine__get_kallsyms_filename(struct machine *machine, char *buf, scnprintf(buf, bufsz, "%s/proc/kallsyms", machine->root_dir); } -/* Figure out the start address of kernel map from /proc/kallsyms */ -static u64 machine__get_kernel_start_addr(struct machine *machine) +const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL}; + +/* Figure out the start address of kernel map from /proc/kallsyms. + * Returns the name of the start symbol in *symbol_name. Pass in NULL as + * symbol_name if it's not that important. + */ +static u64 machine__get_kernel_start_addr(struct machine *machine, + const char **symbol_name) { char filename[PATH_MAX]; - struct process_args args; + int i; + const char *name; + u64 addr = 0; machine__get_kallsyms_filename(machine, filename, PATH_MAX); if (symbol__restricted_filename(filename, "/proc/kallsyms")) return 0; - if (kallsyms__parse(filename, &args, symbol__in_kernel) <= 0) - return 0; + for (i = 0; (name = ref_reloc_sym_names[i]) != NULL; i++) { + addr = kallsyms__get_function_start(filename, name); + if (addr) + break; + } - return args.start; + if (symbol_name) + *symbol_name = name; + + return addr; } int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel) { enum map_type type; - u64 start = machine__get_kernel_start_addr(machine); + u64 start = machine__get_kernel_start_addr(machine, NULL); for (type = 0; type < MAP__NR_TYPES; ++type) { struct kmap *kmap; @@ -852,23 +854,11 @@ static int machine__create_modules(struct machine *machine) return 0; } -const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL}; - int machine__create_kernel_maps(struct machine *machine) { struct dso *kernel = machine__get_kernel(machine); - char filename[PATH_MAX]; const char *name; - u64 addr = 0; - int i; - - machine__get_kallsyms_filename(machine, filename, PATH_MAX); - - for (i = 0; (name = ref_reloc_sym_names[i]) != NULL; i++) { - addr = kallsyms__get_function_start(filename, name); - if (addr) - break; - } + u64 addr = machine__get_kernel_start_addr(machine, &name); if (!addr) return -1; From a2b23bacb315d3873ed90029fd2b68c95de734c0 Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Fri, 20 Jun 2014 12:34:28 +0200 Subject: [PATCH 019/455] Revert "Bluetooth: Add a new PID/VID 0cf3/e005 for AR3012." This reverts commit ca58e594da2486c1d28e7ad547d82266604ec4ce. For some unclear reason this patch tries to add suport for the product ID 0xe005, but it ends up adding product ID 0x3005 to all the tables. This is obviously wrong and causing multiple issues. The original patch seemed to be fine, but what ended up in 3.15 is not what the patch intended. The commit 0a3658cccdf53 is already present and adds support for this hardware. This means only revert of this broken commit is requird. Signed-off-by: Marcel Holtmann Reported-by: Alexander Holler Cc: stable@vger.kernel.org # 3.15.x --- drivers/bluetooth/ath3k.c | 2 -- drivers/bluetooth/btusb.c | 1 - 2 files changed, 3 deletions(-) diff --git a/drivers/bluetooth/ath3k.c b/drivers/bluetooth/ath3k.c index f98380648cb3..f50dffc0374f 100644 --- a/drivers/bluetooth/ath3k.c +++ b/drivers/bluetooth/ath3k.c @@ -90,7 +90,6 @@ static const struct usb_device_id ath3k_table[] = { { USB_DEVICE(0x0b05, 0x17d0) }, { USB_DEVICE(0x0CF3, 0x0036) }, { USB_DEVICE(0x0CF3, 0x3004) }, - { USB_DEVICE(0x0CF3, 0x3005) }, { USB_DEVICE(0x0CF3, 0x3008) }, { USB_DEVICE(0x0CF3, 0x311D) }, { USB_DEVICE(0x0CF3, 0x311E) }, @@ -140,7 +139,6 @@ static const struct usb_device_id ath3k_blist_tbl[] = { { USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0CF3, 0x0036), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x3004), .driver_info = BTUSB_ATH3012 }, - { USB_DEVICE(0x0cf3, 0x3005), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x3008), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x311D), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x311E), .driver_info = BTUSB_ATH3012 }, diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index a1c80b0c7663..6250fc2fb93a 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -162,7 +162,6 @@ static const struct usb_device_id blacklist_table[] = { { USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x0036), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x3004), .driver_info = BTUSB_ATH3012 }, - { USB_DEVICE(0x0cf3, 0x3005), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x3008), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x311d), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x311e), .driver_info = BTUSB_ATH3012 }, From c7262e711ae6e466baeb9ddc21d678c878469b1f Mon Sep 17 00:00:00 2001 From: Johan Hedberg Date: Tue, 17 Jun 2014 13:07:37 +0300 Subject: [PATCH 020/455] Bluetooth: Fix overriding higher security level in SMP When we receive a pairing request or an internal request to start pairing we shouldn't blindly overwrite the existing pending_sec_level value as that may actually be higher than the new one. This patch fixes the SMP code to only overwrite the value in case the new one is higher than the old. Signed-off-by: Johan Hedberg Signed-off-by: Marcel Holtmann --- net/bluetooth/smp.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index f2829a7932e2..0189ec8b68d1 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -669,7 +669,7 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) { struct smp_cmd_pairing rsp, *req = (void *) skb->data; struct smp_chan *smp; - u8 key_size, auth; + u8 key_size, auth, sec_level; int ret; BT_DBG("conn %p", conn); @@ -695,7 +695,9 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) /* We didn't start the pairing, so match remote */ auth = req->auth_req; - conn->hcon->pending_sec_level = authreq_to_seclevel(auth); + sec_level = authreq_to_seclevel(auth); + if (sec_level > conn->hcon->pending_sec_level) + conn->hcon->pending_sec_level = sec_level; build_pairing_cmd(conn, req, &rsp, auth); @@ -838,6 +840,7 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb) struct smp_cmd_pairing cp; struct hci_conn *hcon = conn->hcon; struct smp_chan *smp; + u8 sec_level; BT_DBG("conn %p", conn); @@ -847,7 +850,9 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb) if (!(conn->hcon->link_mode & HCI_LM_MASTER)) return SMP_CMD_NOTSUPP; - hcon->pending_sec_level = authreq_to_seclevel(rp->auth_req); + sec_level = authreq_to_seclevel(rp->auth_req); + if (sec_level > hcon->pending_sec_level) + hcon->pending_sec_level = sec_level; if (smp_ltk_encrypt(conn, hcon->pending_sec_level)) return 0; @@ -901,9 +906,12 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) if (smp_sufficient_security(hcon, sec_level)) return 1; + if (sec_level > hcon->pending_sec_level) + hcon->pending_sec_level = sec_level; + if (hcon->link_mode & HCI_LM_MASTER) - if (smp_ltk_encrypt(conn, sec_level)) - goto done; + if (smp_ltk_encrypt(conn, hcon->pending_sec_level)) + return 0; if (test_and_set_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) return 0; @@ -918,7 +926,7 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) * requires it. */ if (hcon->io_capability != HCI_IO_NO_INPUT_OUTPUT || - sec_level > BT_SECURITY_MEDIUM) + hcon->pending_sec_level > BT_SECURITY_MEDIUM) authreq |= SMP_AUTH_MITM; if (hcon->link_mode & HCI_LM_MASTER) { @@ -937,9 +945,6 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) set_bit(SMP_FLAG_INITIATOR, &smp->flags); -done: - hcon->pending_sec_level = sec_level; - return 0; } From 581370cc74ea75426421a1f5851ef05e2e995b01 Mon Sep 17 00:00:00 2001 From: Johan Hedberg Date: Tue, 17 Jun 2014 13:07:38 +0300 Subject: [PATCH 021/455] Bluetooth: Refactor authentication method lookup into its own function We'll need to do authentication method lookups from more than one place, so refactor the lookup into its own function. Signed-off-by: Johan Hedberg Signed-off-by: Marcel Holtmann --- net/bluetooth/smp.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index 0189ec8b68d1..7156f4720644 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -385,6 +385,16 @@ static const u8 gen_method[5][5] = { { CFM_PASSKEY, CFM_PASSKEY, REQ_PASSKEY, JUST_WORKS, OVERLAP }, }; +static u8 get_auth_method(struct smp_chan *smp, u8 local_io, u8 remote_io) +{ + /* If either side has unknown io_caps, use JUST WORKS */ + if (local_io > SMP_IO_KEYBOARD_DISPLAY || + remote_io > SMP_IO_KEYBOARD_DISPLAY) + return JUST_WORKS; + + return gen_method[remote_io][local_io]; +} + static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth, u8 local_io, u8 remote_io) { @@ -401,14 +411,11 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth, BT_DBG("tk_request: auth:%d lcl:%d rem:%d", auth, local_io, remote_io); /* If neither side wants MITM, use JUST WORKS */ - /* If either side has unknown io_caps, use JUST WORKS */ /* Otherwise, look up method from the table */ - if (!(auth & SMP_AUTH_MITM) || - local_io > SMP_IO_KEYBOARD_DISPLAY || - remote_io > SMP_IO_KEYBOARD_DISPLAY) + if (!(auth & SMP_AUTH_MITM)) method = JUST_WORKS; else - method = gen_method[remote_io][local_io]; + method = get_auth_method(smp, local_io, remote_io); /* If not bonding, don't ask user to confirm a Zero TK */ if (!(auth & SMP_AUTH_BONDING) && method == JUST_CFM) From 2ed8f65ca262bca778e60053f667ce11b32db6b8 Mon Sep 17 00:00:00 2001 From: Johan Hedberg Date: Tue, 17 Jun 2014 13:07:39 +0300 Subject: [PATCH 022/455] Bluetooth: Fix rejecting pairing in case of insufficient capabilities If we need an MITM protected connection but the local and remote IO capabilities cannot provide it we should reject the pairing attempt in the appropriate way. This patch adds the missing checks for such a situation to the smp_cmd_pairing_req() and smp_cmd_pairing_rsp() functions. Signed-off-by: Johan Hedberg Signed-off-by: Marcel Holtmann --- net/bluetooth/smp.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index 7156f4720644..e33a982161c1 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -706,6 +706,16 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) if (sec_level > conn->hcon->pending_sec_level) conn->hcon->pending_sec_level = sec_level; + /* If we need MITM check that it can be acheived */ + if (conn->hcon->pending_sec_level >= BT_SECURITY_HIGH) { + u8 method; + + method = get_auth_method(smp, conn->hcon->io_capability, + req->io_capability); + if (method == JUST_WORKS || method == JUST_CFM) + return SMP_AUTH_REQUIREMENTS; + } + build_pairing_cmd(conn, req, &rsp, auth); key_size = min(req->max_key_size, rsp.max_key_size); @@ -752,6 +762,16 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb) if (check_enc_key_size(conn, key_size)) return SMP_ENC_KEY_SIZE; + /* If we need MITM check that it can be acheived */ + if (conn->hcon->pending_sec_level >= BT_SECURITY_HIGH) { + u8 method; + + method = get_auth_method(smp, req->io_capability, + rsp->io_capability); + if (method == JUST_WORKS || method == JUST_CFM) + return SMP_AUTH_REQUIREMENTS; + } + get_random_bytes(smp->prnd, sizeof(smp->prnd)); smp->prsp[0] = SMP_CMD_PAIRING_RSP; From 1d56dc4f5f7cdf0ba99062d974b7586a28fc5cf4 Mon Sep 17 00:00:00 2001 From: Lukasz Rymanowski Date: Tue, 17 Jun 2014 13:04:20 +0200 Subject: [PATCH 023/455] Bluetooth: Fix for ACL disconnect when pairing fails When pairing fails hci_conn refcnt drops below zero. This cause that ACL link is not disconnected when disconnect timeout fires. Probably this is because l2cap_conn_del calls l2cap_chan_del for each channel, and inside l2cap_chan_del conn is dropped. After that loop hci_chan_del is called which also drops conn. Anyway, as it is desrcibed in hci_core.h, it is known that refcnt drops below 0 sometimes and it should be fine. If so, let disconnect link when hci_conn_timeout fires and refcnt is 0 or below. This patch does it. This affects PTS test SM_TC_JW_BV_05_C Logs from scenario: [69713.706227] [6515] pair_device: [69713.706230] [6515] hci_conn_add: hci0 dst 00:1b:dc:06:06:22 [69713.706233] [6515] hci_dev_hold: hci0 orig refcnt 8 [69713.706235] [6515] hci_conn_init_sysfs: conn ffff88021f65a000 [69713.706239] [6515] hci_req_add_ev: hci0 opcode 0x200d plen 25 [69713.706242] [6515] hci_prepare_cmd: skb len 28 [69713.706243] [6515] hci_req_run: length 1 [69713.706248] [6515] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 0 [69713.706251] [6515] hci_dev_put: hci0 orig refcnt 9 [69713.706281] [8909] hci_cmd_work: hci0 cmd_cnt 1 cmd queued 1 [69713.706288] [8909] hci_send_frame: hci0 type 1 len 28 [69713.706290] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 28 [69713.706316] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.706382] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.711664] [8909] hci_rx_work: hci0 [69713.711668] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 6 [69713.711680] [8909] hci_rx_work: hci0 Event packet [69713.711683] [8909] hci_cs_le_create_conn: hci0 status 0x00 [69713.711685] [8909] hci_sent_cmd_data: hci0 opcode 0x200d [69713.711688] [8909] hci_req_cmd_complete: opcode 0x200d status 0x00 [69713.711690] [8909] hci_sent_cmd_data: hci0 opcode 0x200d [69713.711695] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.711744] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.818875] [8909] hci_rx_work: hci0 [69713.818889] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 21 [69713.818913] [8909] hci_rx_work: hci0 Event packet [69713.818917] [8909] hci_le_conn_complete_evt: hci0 status 0x00 [69713.818922] [8909] hci_send_to_control: len 19 [69713.818927] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.818938] [8909] hci_conn_add_sysfs: conn ffff88021f65a000 [69713.818975] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 [69713.818981] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00 ... [69713.819021] [8909] hci_dev_hold: hci0 orig refcnt 10 [69713.819025] [8909] l2cap_connect_cfm: hcon ffff88021f65a000 bdaddr 00:1b:dc:06:06:22 status 0 [69713.819028] [8909] hci_chan_create: hci0 hcon ffff88021f65a000 [69713.819031] [8909] l2cap_conn_add: hcon ffff88021f65a000 conn ffff880221005c00 hchan ffff88020d60b1c0 [69713.819034] [8909] l2cap_conn_ready: conn ffff880221005c00 [69713.819036] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.819037] [8909] smp_conn_security: conn ffff880221005c00 hcon ffff88021f65a000 level 0x02 [69713.819039] [8909] smp_chan_create: [69713.819041] [8909] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 1 [69713.819043] [8909] smp_send_cmd: code 0x01 [69713.819045] [8909] hci_send_acl: hci0 chan ffff88020d60b1c0 flags 0x0000 [69713.819046] [5949] hci_sock_recvmsg: sock ffff8800941a9900, sk ffff88012bf4e800 [69713.819049] [8909] hci_queue_acl: hci0 nonfrag skb ffff88005157c100 len 15 [69713.819055] [5949] hci_sock_recvmsg: sock ffff8800941a9900, sk ffff88012bf4e800 [69713.819057] [8909] l2cap_le_conn_ready: [69713.819064] [8909] l2cap_chan_create: chan ffff88005ede2c00 [69713.819066] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 1 [69713.819069] [8909] l2cap_sock_init: sk ffff88005ede5800 [69713.819072] [8909] bt_accept_enqueue: parent ffff880160356000, sk ffff88005ede5800 [69713.819074] [8909] __l2cap_chan_add: conn ffff880221005c00, psm 0x00, dcid 0x0004 [69713.819076] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 2 [69713.819078] [8909] hci_conn_hold: hcon ffff88021f65a000 orig refcnt 2 [69713.819080] [8909] smp_conn_security: conn ffff880221005c00 hcon ffff88021f65a000 level 0x01 [69713.819082] [8909] l2cap_sock_ready_cb: sk ffff88005ede5800, parent ffff880160356000 [69713.819086] [8909] le_pairing_complete_cb: status 0 [69713.819091] [8909] hci_tx_work: hci0 acl 10 sco 8 le 0 [69713.819093] [8909] hci_sched_acl: hci0 [69713.819094] [8909] hci_sched_sco: hci0 [69713.819096] [8909] hci_sched_esco: hci0 [69713.819098] [8909] hci_sched_le: hci0 [69713.819099] [8909] hci_chan_sent: hci0 [69713.819101] [8909] hci_chan_sent: chan ffff88020d60b1c0 quote 10 [69713.819104] [8909] hci_sched_le: chan ffff88020d60b1c0 skb ffff88005157c100 len 15 priority 7 [69713.819106] [8909] hci_send_frame: hci0 type 2 len 15 [69713.819108] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 15 [69713.819119] [8909] hci_chan_sent: hci0 [69713.819121] [8909] hci_prio_recalculate: hci0 [69713.819123] [8909] process_pending_rx: [69713.819226] [6450] hci_sock_recvmsg: sock ffff88005e758780, sk ffff88010323d400 ... [69713.822022] [6450] l2cap_sock_accept: sk ffff880160356000 timeo 0 [69713.822024] [6450] bt_accept_dequeue: parent ffff880160356000 [69713.822026] [6450] bt_accept_unlink: sk ffff88005ede5800 state 1 [69713.822028] [6450] l2cap_sock_accept: new socket ffff88005ede5800 [69713.822368] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.822375] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800 [69713.822383] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.822414] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800 ... [69713.823255] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.823259] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800 [69713.824322] [6450] l2cap_sock_getname: sock ffff8800941ab700, sk ffff88005ede5800 [69713.824330] [6450] l2cap_sock_getsockopt: sk ffff88005ede5800 [69713.825029] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 ... [69713.825187] [6450] l2cap_sock_sendmsg: sock ffff8800941ab700, sk ffff88005ede5800 [69713.825189] [6450] bt_sock_wait_ready: sk ffff88005ede5800 [69713.825192] [6450] l2cap_create_basic_pdu: chan ffff88005ede2c00 len 3 [69713.825196] [6450] l2cap_do_send: chan ffff88005ede2c00, skb ffff880160b0b500 len 7 priority 0 [69713.825199] [6450] hci_send_acl: hci0 chan ffff88020d60b1c0 flags 0x0000 [69713.825201] [6450] hci_queue_acl: hci0 nonfrag skb ffff880160b0b500 len 11 [69713.825210] [8909] hci_tx_work: hci0 acl 9 sco 8 le 0 [69713.825213] [8909] hci_sched_acl: hci0 [69713.825214] [8909] hci_sched_sco: hci0 [69713.825216] [8909] hci_sched_esco: hci0 [69713.825217] [8909] hci_sched_le: hci0 [69713.825219] [8909] hci_chan_sent: hci0 [69713.825221] [8909] hci_chan_sent: chan ffff88020d60b1c0 quote 9 [69713.825223] [8909] hci_sched_le: chan ffff88020d60b1c0 skb ffff880160b0b500 len 11 priority 0 [69713.825225] [8909] hci_send_frame: hci0 type 2 len 11 [69713.825227] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 11 [69713.825242] [8909] hci_chan_sent: hci0 [69713.825253] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.825253] [8909] hci_prio_recalculate: hci0 [69713.825292] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.825768] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 ... [69713.866902] [8909] hci_rx_work: hci0 [69713.866921] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 7 [69713.866928] [8909] hci_rx_work: hci0 Event packet [69713.866931] [8909] hci_num_comp_pkts_evt: hci0 num_hndl 1 [69713.866937] [8909] hci_tx_work: hci0 acl 9 sco 8 le 0 [69713.866939] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.866940] [8909] hci_sched_acl: hci0 ... [69713.866944] [8909] hci_sched_le: hci0 [69713.866953] [8909] hci_chan_sent: hci0 [69713.866997] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.867840] [28074] hci_rx_work: hci0 [69713.867844] [28074] hci_send_to_monitor: hdev ffff88021f0c7000 len 7 [69713.867850] [28074] hci_rx_work: hci0 Event packet [69713.867853] [28074] hci_num_comp_pkts_evt: hci0 num_hndl 1 [69713.867857] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69713.867858] [28074] hci_tx_work: hci0 acl 10 sco 8 le 0 [69713.867860] [28074] hci_sched_acl: hci0 [69713.867861] [28074] hci_sched_sco: hci0 [69713.867862] [28074] hci_sched_esco: hci0 [69713.867863] [28074] hci_sched_le: hci0 [69713.867865] [28074] hci_chan_sent: hci0 [69713.867888] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69714.145661] [8909] hci_rx_work: hci0 [69714.145666] [8909] hci_send_to_monitor: hdev ffff88021f0c7000 len 10 [69714.145676] [8909] hci_rx_work: hci0 ACL data packet [69714.145679] [8909] hci_acldata_packet: hci0 len 6 handle 0x002d flags 0x0002 [69714.145681] [8909] hci_conn_enter_active_mode: hcon ffff88021f65a000 mode 0 [69714.145683] [8909] l2cap_recv_acldata: conn ffff880221005c00 len 6 flags 0x2 [69714.145693] [8909] l2cap_recv_frame: len 2, cid 0x0006 [69714.145696] [8909] hci_send_to_control: len 14 [69714.145710] [8909] smp_chan_destroy: [69714.145713] [8909] pairing_complete: status 3 [69714.145714] [8909] cmd_complete: sock ffff88010323ac00 [69714.145717] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 3 [69714.145719] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69714.145720] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 [69714.145722] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00 [69714.145724] [6450] bt_sock_poll: sock ffff8801db6b4f00, sk ffff880160351c00 ... [69714.145735] [6515] hci_sock_recvmsg: sock ffff88005e75a080, sk ffff88010323ac00 [69714.145737] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 2 [69714.145739] [8909] l2cap_conn_del: hcon ffff88021f65a000 conn ffff880221005c00, err 13 [69714.145740] [6450] bt_sock_poll: sock ffff8801db6b5400, sk ffff88021e775000 [69714.145743] [6450] bt_sock_poll: sock ffff8801db6b5e00, sk ffff880160356000 [69714.145744] [8909] l2cap_chan_hold: chan ffff88005ede2c00 orig refcnt 3 [69714.145746] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800 [69714.145748] [8909] l2cap_chan_del: chan ffff88005ede2c00, conn ffff880221005c00, err 13 [69714.145749] [8909] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 4 [69714.145751] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 1 [69714.145754] [6450] bt_sock_poll: sock ffff8800941ab700, sk ffff88005ede5800 [69714.145756] [8909] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 3 [69714.145759] [8909] hci_chan_del: hci0 hcon ffff88021f65a000 chan ffff88020d60b1c0 [69714.145766] [5949] hci_sock_recvmsg: sock ffff8800941a9680, sk ffff88012bf4d000 [69714.145787] [6515] hci_sock_release: sock ffff88005e75a080 sk ffff88010323ac00 [69714.146002] [6450] hci_sock_recvmsg: sock ffff88005e758780, sk ffff88010323d400 [69714.150795] [6450] l2cap_sock_release: sock ffff8800941ab700, sk ffff88005ede5800 [69714.150799] [6450] l2cap_sock_shutdown: sock ffff8800941ab700, sk ffff88005ede5800 [69714.150802] [6450] l2cap_chan_close: chan ffff88005ede2c00 state BT_CLOSED [69714.150805] [6450] l2cap_sock_kill: sk ffff88005ede5800 state BT_CLOSED [69714.150806] [6450] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 2 [69714.150808] [6450] l2cap_sock_destruct: sk ffff88005ede5800 [69714.150809] [6450] l2cap_chan_put: chan ffff88005ede2c00 orig refcnt 1 [69714.150811] [6450] l2cap_chan_destroy: chan ffff88005ede2c00 [69714.150970] [6450] bt_sock_poll: sock ffff88005e758500, sk ffff88010323b800 ... [69714.151991] [8909] hci_conn_drop: hcon ffff88021f65a000 orig refcnt 0 [69716.150339] [8909] hci_conn_timeout: hcon ffff88021f65a000 state BT_CONNECTED, refcnt -1 Signed-off-by: Lukasz Rymanowski Signed-off-by: Marcel Holtmann --- net/bluetooth/hci_conn.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index ca01d1861854..a7a27bc2c0b1 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -289,10 +289,20 @@ static void hci_conn_timeout(struct work_struct *work) { struct hci_conn *conn = container_of(work, struct hci_conn, disc_work.work); + int refcnt = atomic_read(&conn->refcnt); BT_DBG("hcon %p state %s", conn, state_to_string(conn->state)); - if (atomic_read(&conn->refcnt)) + WARN_ON(refcnt < 0); + + /* FIXME: It was observed that in pairing failed scenario, refcnt + * drops below 0. Probably this is because l2cap_conn_del calls + * l2cap_chan_del for each channel, and inside l2cap_chan_del conn is + * dropped. After that loop hci_chan_del is called which also drops + * conn. For now make sure that ACL is alive if refcnt is higher then 0, + * otherwise drop it. + */ + if (refcnt > 0) return; switch (conn->state) { From 744462a91ecf5d1ec64857488bf99000ef626921 Mon Sep 17 00:00:00 2001 From: Max Stepanov Date: Tue, 10 Jun 2014 20:00:08 +0300 Subject: [PATCH 024/455] mac80211: WEP extra head/tail room in ieee80211_send_auth After skb allocation and call to ieee80211_wep_encrypt in ieee80211_send_auth the flow fails with a warning in ieee80211_wep_add_iv on verification of available head/tailroom needed for WEP_IV and WEP_ICV. Signed-off-by: Max Stepanov Signed-off-by: Emmanuel Grumbach Signed-off-by: Johannes Berg --- net/mac80211/util.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 6886601afe1c..a6cda52ed920 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -1096,11 +1096,12 @@ void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata, int err; /* 24 + 6 = header + auth_algo + auth_transaction + status_code */ - skb = dev_alloc_skb(local->hw.extra_tx_headroom + 24 + 6 + extra_len); + skb = dev_alloc_skb(local->hw.extra_tx_headroom + IEEE80211_WEP_IV_LEN + + 24 + 6 + extra_len + IEEE80211_WEP_ICV_LEN); if (!skb) return; - skb_reserve(skb, local->hw.extra_tx_headroom); + skb_reserve(skb, local->hw.extra_tx_headroom + IEEE80211_WEP_IV_LEN); mgmt = (struct ieee80211_mgmt *) skb_put(skb, 24 + 6); memset(mgmt, 0, 24 + 6); From e33e2241e272eddc38339692500bd1c7d8753a77 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 23 Jun 2014 11:06:16 +0200 Subject: [PATCH 025/455] Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels" This reverts commit 8eca1fb692cc9557f386eddce75c300a3855d11a. Felix notes that this broke regulatory, leaving channel 12 open for AP operation in the US regulatory domain where it isn't permitted. Link: http://mid.gmane.org/53A6C0FF.9090104@openwrt.org Reported-by: Felix Fietkau Signed-off-by: Johannes Berg --- net/wireless/reg.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/net/wireless/reg.c b/net/wireless/reg.c index 558b0e3a02d8..1afdf45db38f 100644 --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -935,7 +935,7 @@ freq_reg_info_regd(struct wiphy *wiphy, u32 center_freq, if (!band_rule_found) band_rule_found = freq_in_rule_band(fr, center_freq); - bw_fits = reg_does_bw_fit(fr, center_freq, MHZ_TO_KHZ(5)); + bw_fits = reg_does_bw_fit(fr, center_freq, MHZ_TO_KHZ(20)); if (band_rule_found && bw_fits) return rr; @@ -1019,10 +1019,10 @@ static void chan_reg_rule_print_dbg(const struct ieee80211_regdomain *regd, } #endif -/* Find an ieee80211_reg_rule such that a 5MHz channel with frequency - * chan->center_freq fits there. - * If there is no such reg_rule, disable the channel, otherwise set the - * flags corresponding to the bandwidths allowed in the particular reg_rule +/* + * Note that right now we assume the desired channel bandwidth + * is always 20 MHz for each individual channel (HT40 uses 20 MHz + * per channel, the primary and the extension channel). */ static void handle_channel(struct wiphy *wiphy, enum nl80211_reg_initiator initiator, @@ -1083,12 +1083,8 @@ static void handle_channel(struct wiphy *wiphy, if (reg_rule->flags & NL80211_RRF_AUTO_BW) max_bandwidth_khz = reg_get_max_bandwidth(regd, reg_rule); - if (max_bandwidth_khz < MHZ_TO_KHZ(10)) - bw_flags = IEEE80211_CHAN_NO_10MHZ; - if (max_bandwidth_khz < MHZ_TO_KHZ(20)) - bw_flags |= IEEE80211_CHAN_NO_20MHZ; if (max_bandwidth_khz < MHZ_TO_KHZ(40)) - bw_flags |= IEEE80211_CHAN_NO_HT40; + bw_flags = IEEE80211_CHAN_NO_HT40; if (max_bandwidth_khz < MHZ_TO_KHZ(80)) bw_flags |= IEEE80211_CHAN_NO_80MHZ; if (max_bandwidth_khz < MHZ_TO_KHZ(160)) @@ -1522,12 +1518,8 @@ static void handle_channel_custom(struct wiphy *wiphy, if (reg_rule->flags & NL80211_RRF_AUTO_BW) max_bandwidth_khz = reg_get_max_bandwidth(regd, reg_rule); - if (max_bandwidth_khz < MHZ_TO_KHZ(10)) - bw_flags = IEEE80211_CHAN_NO_10MHZ; - if (max_bandwidth_khz < MHZ_TO_KHZ(20)) - bw_flags |= IEEE80211_CHAN_NO_20MHZ; if (max_bandwidth_khz < MHZ_TO_KHZ(40)) - bw_flags |= IEEE80211_CHAN_NO_HT40; + bw_flags = IEEE80211_CHAN_NO_HT40; if (max_bandwidth_khz < MHZ_TO_KHZ(80)) bw_flags |= IEEE80211_CHAN_NO_80MHZ; if (max_bandwidth_khz < MHZ_TO_KHZ(160)) From 0ce12026d6ea4a24d52ddcde36b9643f2e26d560 Mon Sep 17 00:00:00 2001 From: Eliad Peller Date: Tue, 17 Jun 2014 13:07:02 +0300 Subject: [PATCH 026/455] cfg80211: fix elapsed_jiffies calculation MAX_JIFFY_OFFSET has no meaning when calculating the elapsed jiffies, as jiffies run out until ULONG_MAX. This miscalculation results in erroneous values in case of a wrap-around. Signed-off-by: Eliad Peller Signed-off-by: Johannes Berg --- net/wireless/core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/core.h b/net/wireless/core.h index e9afbf10e756..7e3a3cef7df9 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -424,7 +424,7 @@ static inline unsigned int elapsed_jiffies_msecs(unsigned long start) if (end >= start) return jiffies_to_msecs(end - start); - return jiffies_to_msecs(end + (MAX_JIFFY_OFFSET - start) + 1); + return jiffies_to_msecs(end + (ULONG_MAX - start) + 1); } void From 48439d501e3d9e8634bdc0c418e066870039599d Mon Sep 17 00:00:00 2001 From: Loic Poulain Date: Mon, 23 Jun 2014 17:42:44 +0200 Subject: [PATCH 027/455] Bluetooth: Ignore H5 non-link packets in non-active state When detecting a non-link packet, h5_reset_rx() frees the Rx skb. Not returning after that will cause the upcoming h5_rx_payload() call to dereference a now NULL Rx skb and trigger a kernel oops. Signed-off-by: Loic Poulain Signed-off-by: Marcel Holtmann Cc: stable@vger.kernel.org --- drivers/bluetooth/hci_h5.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bluetooth/hci_h5.c b/drivers/bluetooth/hci_h5.c index 04680ead9275..fede8ca7147c 100644 --- a/drivers/bluetooth/hci_h5.c +++ b/drivers/bluetooth/hci_h5.c @@ -406,6 +406,7 @@ static int h5_rx_3wire_hdr(struct hci_uart *hu, unsigned char c) H5_HDR_PKT_TYPE(hdr) != HCI_3WIRE_LINK_PKT) { BT_ERR("Non-link packet received in non-active state"); h5_reset_rx(h5); + return 0; } h5->rx_func = h5_rx_payload; From 546a9d8519ed137b2804a3f5a3659003039dd49c Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 19 Jun 2014 14:57:10 -0700 Subject: [PATCH 028/455] rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() Currently, call_rcu() relies on implicit allocation and initialization for the debug-objects handling of RCU callbacks. If you hammer the kernel hard enough with Sasha's modified version of trinity, you can end up with the sl*b allocators recursing into themselves via this implicit call_rcu() allocation. This commit therefore exports the debug_init_rcu_head() and debug_rcu_head_free() functions, which permits the allocators to allocated and pre-initialize the debug-objects information, so that there no longer any need for call_rcu() to do that initialization, which in turn prevents the recursion into the memory allocators. Reported-by: Sasha Levin Suggested-by: Thomas Gleixner Signed-off-by: Paul E. McKenney Acked-by: Thomas Gleixner Looks-good-to: Christoph Lameter --- include/linux/rcupdate.h | 10 ++++++++++ kernel/rcu/update.c | 4 ++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5a75d19aa661..13bbfbde41b9 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -358,9 +358,19 @@ void wait_rcu_gp(call_rcu_func_t crf); * initialization. */ #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD +void init_rcu_head(struct rcu_head *head); +void destroy_rcu_head(struct rcu_head *head); void init_rcu_head_on_stack(struct rcu_head *head); void destroy_rcu_head_on_stack(struct rcu_head *head); #else /* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */ +static inline void init_rcu_head(struct rcu_head *head) +{ +} + +static inline void destroy_rcu_head(struct rcu_head *head) +{ +} + static inline void init_rcu_head_on_stack(struct rcu_head *head) { } diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index a2aeb4df0f60..0fb691e63ce6 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -200,12 +200,12 @@ void wait_rcu_gp(call_rcu_func_t crf) EXPORT_SYMBOL_GPL(wait_rcu_gp); #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD -static inline void debug_init_rcu_head(struct rcu_head *head) +void init_rcu_head(struct rcu_head *head) { debug_object_init(head, &rcuhead_debug_descr); } -static inline void debug_rcu_head_free(struct rcu_head *head) +void destroy_rcu_head(struct rcu_head *head) { debug_object_free(head, &rcuhead_debug_descr); } From 4a81e8328d3791a4f99bf5b436d050f6dc5ffea3 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 20 Jun 2014 16:49:01 -0700 Subject: [PATCH 029/455] rcu: Reduce overhead of cond_resched() checks for RCU Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen Signed-off-by: Paul E. McKenney Cc: Andi Kleen Cc: Christoph Lameter Cc: Mike Galbraith Cc: Eric Dumazet Reviewed-by: Josh Triplett [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney --- Documentation/kernel-parameters.txt | 6 ++ include/linux/rcupdate.h | 36 ------- kernel/rcu/tree.c | 140 ++++++++++++++++++++++------ kernel/rcu/tree.h | 6 +- kernel/rcu/tree_plugin.h | 2 +- kernel/rcu/update.c | 18 ---- kernel/sched/core.c | 7 +- 7 files changed, 125 insertions(+), 90 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6eaa9cdb7094..910c3829f81d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. leaf rcu_node structure. Useful for very large systems. + rcutree.jiffies_till_sched_qs= [KNL] + Set required age in jiffies for a + given grace period before RCU starts + soliciting quiescent-state help from + rcu_note_context_switch(). + rcutree.jiffies_till_first_fqs= [KNL] Set delay from grace-period initialization to first attempt to force quiescent states. diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 13bbfbde41b9..6a94cc8b1ca0 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -44,7 +44,6 @@ #include #include #include -#include #include extern int rcu_expedited; /* for sysctl */ @@ -299,41 +298,6 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev, bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ -/* - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. - */ - -#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ -DECLARE_PER_CPU(int, rcu_cond_resched_count); -void rcu_resched(void); - -/* - * Is it time to report RCU quiescent states? - * - * Note unsynchronized access to rcu_cond_resched_count. Yes, we might - * increment some random CPU's count, and possibly also load the result from - * yet another CPU's count. We might even clobber some other CPU's attempt - * to zero its counter. This is all OK because the goal is not precision, - * but rather reasonable amortization of rcu_note_context_switch() overhead - * and extremely high probability of avoiding RCU CPU stall warnings. - * Note that this function has to be preempted in just the wrong place, - * many thousands of times in a row, for anything bad to happen. - */ -static inline bool rcu_should_resched(void) -{ - return raw_cpu_inc_return(rcu_cond_resched_count) >= - RCU_COND_RESCHED_LIM; -} - -/* - * Report quiscent states to RCU if it is time to do so. - */ -static inline void rcu_cond_resched(void) -{ - if (unlikely(rcu_should_resched())) - rcu_resched(); -} - /* * Infrastructure to implement the synchronize_() primitives in * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index f1ba77363fbb..625d0b0cd75a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -206,6 +206,70 @@ void rcu_bh_qs(int cpu) rdp->passed_quiesce = 1; } +static DEFINE_PER_CPU(int, rcu_sched_qs_mask); + +static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { + .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE, + .dynticks = ATOMIC_INIT(1), +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE + .dynticks_idle_nesting = DYNTICK_TASK_NEST_VALUE, + .dynticks_idle = ATOMIC_INIT(1), +#endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */ +}; + +/* + * Let the RCU core know that this CPU has gone through the scheduler, + * which is a quiescent state. This is called when the need for a + * quiescent state is urgent, so we burn an atomic operation and full + * memory barriers to let the RCU core know about it, regardless of what + * this CPU might (or might not) do in the near future. + * + * We inform the RCU core by emulating a zero-duration dyntick-idle + * period, which we in turn do by incrementing the ->dynticks counter + * by two. + */ +static void rcu_momentary_dyntick_idle(void) +{ + unsigned long flags; + struct rcu_data *rdp; + struct rcu_dynticks *rdtp; + int resched_mask; + struct rcu_state *rsp; + + local_irq_save(flags); + + /* + * Yes, we can lose flag-setting operations. This is OK, because + * the flag will be set again after some delay. + */ + resched_mask = raw_cpu_read(rcu_sched_qs_mask); + raw_cpu_write(rcu_sched_qs_mask, 0); + + /* Find the flavor that needs a quiescent state. */ + for_each_rcu_flavor(rsp) { + rdp = raw_cpu_ptr(rsp->rda); + if (!(resched_mask & rsp->flavor_mask)) + continue; + smp_mb(); /* rcu_sched_qs_mask before cond_resched_completed. */ + if (ACCESS_ONCE(rdp->mynode->completed) != + ACCESS_ONCE(rdp->cond_resched_completed)) + continue; + + /* + * Pretend to be momentarily idle for the quiescent state. + * This allows the grace-period kthread to record the + * quiescent state, with no need for this CPU to do anything + * further. + */ + rdtp = this_cpu_ptr(&rcu_dynticks); + smp_mb__before_atomic(); /* Earlier stuff before QS. */ + atomic_add(2, &rdtp->dynticks); /* QS. */ + smp_mb__after_atomic(); /* Later stuff after QS. */ + break; + } + local_irq_restore(flags); +} + /* * Note a context switch. This is a quiescent state for RCU-sched, * and requires special handling for preemptible RCU. @@ -216,19 +280,12 @@ void rcu_note_context_switch(int cpu) trace_rcu_utilization(TPS("Start context switch")); rcu_sched_qs(cpu); rcu_preempt_note_context_switch(cpu); + if (unlikely(raw_cpu_read(rcu_sched_qs_mask))) + rcu_momentary_dyntick_idle(); trace_rcu_utilization(TPS("End context switch")); } EXPORT_SYMBOL_GPL(rcu_note_context_switch); -static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { - .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE, - .dynticks = ATOMIC_INIT(1), -#ifdef CONFIG_NO_HZ_FULL_SYSIDLE - .dynticks_idle_nesting = DYNTICK_TASK_NEST_VALUE, - .dynticks_idle = ATOMIC_INIT(1), -#endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */ -}; - static long blimit = 10; /* Maximum callbacks per rcu_do_batch. */ static long qhimark = 10000; /* If this many pending, ignore blimit. */ static long qlowmark = 100; /* Once only this many pending, use blimit. */ @@ -243,6 +300,13 @@ static ulong jiffies_till_next_fqs = ULONG_MAX; module_param(jiffies_till_first_fqs, ulong, 0644); module_param(jiffies_till_next_fqs, ulong, 0644); +/* + * How long the grace period must be before we start recruiting + * quiescent-state help from rcu_note_context_switch(). + */ +static ulong jiffies_till_sched_qs = HZ / 20; +module_param(jiffies_till_sched_qs, ulong, 0644); + static bool rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp); static void force_qs_rnp(struct rcu_state *rsp, @@ -853,6 +917,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp, bool *isidle, unsigned long *maxj) { unsigned int curr; + int *rcrmp; unsigned int snap; curr = (unsigned int)atomic_add_return(0, &rdp->dynticks->dynticks); @@ -893,27 +958,43 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp, } /* - * There is a possibility that a CPU in adaptive-ticks state - * might run in the kernel with the scheduling-clock tick disabled - * for an extended time period. Invoke rcu_kick_nohz_cpu() to - * force the CPU to restart the scheduling-clock tick in this - * CPU is in this state. + * A CPU running for an extended time within the kernel can + * delay RCU grace periods. When the CPU is in NO_HZ_FULL mode, + * even context-switching back and forth between a pair of + * in-kernel CPU-bound tasks cannot advance grace periods. + * So if the grace period is old enough, make the CPU pay attention. + * Note that the unsynchronized assignments to the per-CPU + * rcu_sched_qs_mask variable are safe. Yes, setting of + * bits can be lost, but they will be set again on the next + * force-quiescent-state pass. So lost bit sets do not result + * in incorrect behavior, merely in a grace period lasting + * a few jiffies longer than it might otherwise. Because + * there are at most four threads involved, and because the + * updates are only once every few jiffies, the probability of + * lossage (and thus of slight grace-period extension) is + * quite low. + * + * Note that if the jiffies_till_sched_qs boot/sysfs parameter + * is set too high, we override with half of the RCU CPU stall + * warning delay. */ - rcu_kick_nohz_cpu(rdp->cpu); - - /* - * Alternatively, the CPU might be running in the kernel - * for an extended period of time without a quiescent state. - * Attempt to force the CPU through the scheduler to gain the - * needed quiescent state, but only if the grace period has gone - * on for an uncommonly long time. If there are many stuck CPUs, - * we will beat on the first one until it gets unstuck, then move - * to the next. Only do this for the primary flavor of RCU. - */ - if (rdp->rsp == rcu_state_p && + rcrmp = &per_cpu(rcu_sched_qs_mask, rdp->cpu); + if (ULONG_CMP_GE(jiffies, + rdp->rsp->gp_start + jiffies_till_sched_qs) || ULONG_CMP_GE(jiffies, rdp->rsp->jiffies_resched)) { - rdp->rsp->jiffies_resched += 5; - resched_cpu(rdp->cpu); + if (!(ACCESS_ONCE(*rcrmp) & rdp->rsp->flavor_mask)) { + ACCESS_ONCE(rdp->cond_resched_completed) = + ACCESS_ONCE(rdp->mynode->completed); + smp_mb(); /* ->cond_resched_completed before *rcrmp. */ + ACCESS_ONCE(*rcrmp) = + ACCESS_ONCE(*rcrmp) + rdp->rsp->flavor_mask; + resched_cpu(rdp->cpu); /* Force CPU into scheduler. */ + rdp->rsp->jiffies_resched += 5; /* Enable beating. */ + } else if (ULONG_CMP_GE(jiffies, rdp->rsp->jiffies_resched)) { + /* Time to beat on that CPU again! */ + resched_cpu(rdp->cpu); /* Force CPU into scheduler. */ + rdp->rsp->jiffies_resched += 5; /* Re-enable beating. */ + } } return 0; @@ -3491,6 +3572,7 @@ static void __init rcu_init_one(struct rcu_state *rsp, "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }; /* Match MAX_RCU_LVLS */ + static u8 fl_mask = 0x1; int cpustride = 1; int i; int j; @@ -3509,6 +3591,8 @@ static void __init rcu_init_one(struct rcu_state *rsp, for (i = 1; i < rcu_num_lvls; i++) rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1]; rcu_init_levelspread(rsp); + rsp->flavor_mask = fl_mask; + fl_mask <<= 1; /* Initialize the elements themselves, starting from the leaves. */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index bf2c1e669691..0f69a79c5b7d 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -307,6 +307,9 @@ struct rcu_data { /* 4) reasons this CPU needed to be kicked by force_quiescent_state */ unsigned long dynticks_fqs; /* Kicked due to dynticks idle. */ unsigned long offline_fqs; /* Kicked due to being offline. */ + unsigned long cond_resched_completed; + /* Grace period that needs help */ + /* from cond_resched(). */ /* 5) __rcu_pending() statistics. */ unsigned long n_rcu_pending; /* rcu_pending() calls since boot. */ @@ -392,6 +395,7 @@ struct rcu_state { struct rcu_node *level[RCU_NUM_LVLS]; /* Hierarchy levels. */ u32 levelcnt[MAX_RCU_LVLS + 1]; /* # nodes in each level. */ u8 levelspread[RCU_NUM_LVLS]; /* kids/node in each level. */ + u8 flavor_mask; /* bit in flavor mask. */ struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */ void (*call)(struct rcu_head *head, /* call_rcu() flavor. */ void (*func)(struct rcu_head *head)); @@ -563,7 +567,7 @@ static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp); static void do_nocb_deferred_wakeup(struct rcu_data *rdp); static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp); -static void rcu_kick_nohz_cpu(int cpu); +static void __maybe_unused rcu_kick_nohz_cpu(int cpu); static bool init_nocb_callback_list(struct rcu_data *rdp); static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq); static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index cbc2c45265e2..02ac0fb186b8 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2404,7 +2404,7 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) * if an adaptive-ticks CPU is failing to respond to the current grace * period and has not be idle from an RCU perspective, kick it. */ -static void rcu_kick_nohz_cpu(int cpu) +static void __maybe_unused rcu_kick_nohz_cpu(int cpu) { #ifdef CONFIG_NO_HZ_FULL if (tick_nohz_full_cpu(cpu)) diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 0fb691e63ce6..bc7883570530 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -350,21 +350,3 @@ static int __init check_cpu_stall_init(void) early_initcall(check_cpu_stall_init); #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ - -/* - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. - */ - -DEFINE_PER_CPU(int, rcu_cond_resched_count); - -/* - * Report a set of RCU quiescent states, for use by cond_resched() - * and friends. Out of line due to being called infrequently. - */ -void rcu_resched(void) -{ - preempt_disable(); - __this_cpu_write(rcu_cond_resched_count, 0); - rcu_note_context_switch(smp_processor_id()); - preempt_enable(); -} diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3bdf01b494fe..bc1638b33449 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4147,7 +4147,6 @@ static void __cond_resched(void) int __sched _cond_resched(void) { - rcu_cond_resched(); if (should_resched()) { __cond_resched(); return 1; @@ -4166,18 +4165,15 @@ EXPORT_SYMBOL(_cond_resched); */ int __cond_resched_lock(spinlock_t *lock) { - bool need_rcu_resched = rcu_should_resched(); int resched = should_resched(); int ret = 0; lockdep_assert_held(lock); - if (spin_needbreak(lock) || resched || need_rcu_resched) { + if (spin_needbreak(lock) || resched) { spin_unlock(lock); if (resched) __cond_resched(); - else if (unlikely(need_rcu_resched)) - rcu_resched(); else cpu_relax(); ret = 1; @@ -4191,7 +4187,6 @@ int __sched __cond_resched_softirq(void) { BUG_ON(!in_softirq()); - rcu_cond_resched(); /* BH disabled OK, just recording QSes. */ if (should_resched()) { local_bh_enable(); __cond_resched(); From 89879413eba4d9f57b69f84a32fd085d54057df3 Mon Sep 17 00:00:00 2001 From: Eliad Peller Date: Mon, 26 May 2014 18:44:35 +0300 Subject: [PATCH 030/455] iwlwifi: mvm: rework sched scan channel configuration The current sched scan channel configuration code configures all the supported channels for scanning. However, this can result in SYSASSERT in some cases, when the configured channel is disabled. Instead, configure only the channels given in the req struct, and set the channel_count field appropriately. While on it, change the code to use channel->hw_value instead of recalculating the channel number. Signed-off-by: Eliad Peller Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/mvm/scan.c | 63 +++++++------------------ 1 file changed, 18 insertions(+), 45 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/scan.c b/drivers/net/wireless/iwlwifi/mvm/scan.c index 4b6c7d4bd199..eac2b424f6a0 100644 --- a/drivers/net/wireless/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/iwlwifi/mvm/scan.c @@ -588,9 +588,7 @@ static void iwl_build_scan_cmd(struct iwl_mvm *mvm, struct iwl_scan_offload_cmd *scan, struct iwl_mvm_scan_params *params) { - scan->channel_count = - mvm->nvm_data->bands[IEEE80211_BAND_2GHZ].n_channels + - mvm->nvm_data->bands[IEEE80211_BAND_5GHZ].n_channels; + scan->channel_count = req->n_channels; scan->quiet_time = cpu_to_le16(IWL_ACTIVE_QUIET_TIME); scan->quiet_plcp_th = cpu_to_le16(IWL_PLCP_QUIET_THRESH); scan->good_CRC_th = IWL_GOOD_CRC_TH_DEFAULT; @@ -669,61 +667,37 @@ static void iwl_build_channel_cfg(struct iwl_mvm *mvm, struct cfg80211_sched_scan_request *req, struct iwl_scan_channel_cfg *channels, enum ieee80211_band band, - int *head, int *tail, + int *head, u32 ssid_bitmap, struct iwl_mvm_scan_params *params) { - struct ieee80211_supported_band *s_band; - int n_channels = req->n_channels; - int i, j, index = 0; - bool partial; + int i, index = 0; - /* - * We have to configure all supported channels, even if we don't want to - * scan on them, but we have to send channels in the order that we want - * to scan. So add requested channels to head of the list and others to - * the end. - */ - s_band = &mvm->nvm_data->bands[band]; + for (i = 0; i < req->n_channels; i++) { + struct ieee80211_channel *chan = req->channels[i]; - for (i = 0; i < s_band->n_channels && *head <= *tail; i++) { - partial = false; - for (j = 0; j < n_channels; j++) - if (s_band->channels[i].center_freq == - req->channels[j]->center_freq) { - index = *head; - (*head)++; - /* - * Channels that came with the request will be - * in partial scan . - */ - partial = true; - break; - } - if (!partial) { - index = *tail; - (*tail)--; - } - channels->channel_number[index] = - cpu_to_le16(ieee80211_frequency_to_channel( - s_band->channels[i].center_freq)); + if (chan->band != band) + continue; + + index = *head; + (*head)++; + + channels->channel_number[index] = cpu_to_le16(chan->hw_value); channels->dwell_time[index][0] = params->dwell[band].active; channels->dwell_time[index][1] = params->dwell[band].passive; channels->iter_count[index] = cpu_to_le16(1); channels->iter_interval[index] = 0; - if (!(s_band->channels[i].flags & IEEE80211_CHAN_NO_IR)) + if (!(chan->flags & IEEE80211_CHAN_NO_IR)) channels->type[index] |= cpu_to_le32(IWL_SCAN_OFFLOAD_CHANNEL_ACTIVE); channels->type[index] |= - cpu_to_le32(IWL_SCAN_OFFLOAD_CHANNEL_FULL); - if (partial) - channels->type[index] |= - cpu_to_le32(IWL_SCAN_OFFLOAD_CHANNEL_PARTIAL); + cpu_to_le32(IWL_SCAN_OFFLOAD_CHANNEL_FULL | + IWL_SCAN_OFFLOAD_CHANNEL_PARTIAL); - if (s_band->channels[i].flags & IEEE80211_CHAN_NO_HT40) + if (chan->flags & IEEE80211_CHAN_NO_HT40) channels->type[index] |= cpu_to_le32(IWL_SCAN_OFFLOAD_CHANNEL_NARROW); @@ -740,7 +714,6 @@ int iwl_mvm_config_sched_scan(struct iwl_mvm *mvm, int band_2ghz = mvm->nvm_data->bands[IEEE80211_BAND_2GHZ].n_channels; int band_5ghz = mvm->nvm_data->bands[IEEE80211_BAND_5GHZ].n_channels; int head = 0; - int tail = band_2ghz + band_5ghz - 1; u32 ssid_bitmap; int cmd_len; int ret; @@ -772,7 +745,7 @@ int iwl_mvm_config_sched_scan(struct iwl_mvm *mvm, &scan_cfg->scan_cmd.tx_cmd[0], scan_cfg->data); iwl_build_channel_cfg(mvm, req, &scan_cfg->channel_cfg, - IEEE80211_BAND_2GHZ, &head, &tail, + IEEE80211_BAND_2GHZ, &head, ssid_bitmap, ¶ms); } if (band_5ghz) { @@ -782,7 +755,7 @@ int iwl_mvm_config_sched_scan(struct iwl_mvm *mvm, scan_cfg->data + SCAN_OFFLOAD_PROBE_REQ_SIZE); iwl_build_channel_cfg(mvm, req, &scan_cfg->channel_cfg, - IEEE80211_BAND_5GHZ, &head, &tail, + IEEE80211_BAND_5GHZ, &head, ssid_bitmap, ¶ms); } From 511c66818d87db2a8931e7f7f92c7904bdd84f72 Mon Sep 17 00:00:00 2001 From: Mihai Caraman Date: Wed, 18 Jun 2014 18:45:05 +0300 Subject: [PATCH 031/455] KVM: PPC: Book3E: Unlock mmu_lock when setting caching atttribute The patch 08c9a188d0d0fc0f0c5e17d89a06bb59c493110f kvm: powerpc: use caching attributes as per linux pte do not handle properly the error case, letting mmu_lock locked. The lock will further generate a RCU stall from kvmppc_e500_emul_tlbwe() caller. In case of an error go to out label. Signed-off-by: Mihai Caraman Signed-off-by: Alexander Graf --- arch/powerpc/kvm/e500_mmu_host.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index dd2cc03f406f..86903d3f5a03 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -473,7 +473,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, if (printk_ratelimit()) pr_err("%s: pte not present: gfn %lx, pfn %lx\n", __func__, (long)gfn, pfn); - return -EINVAL; + ret = -EINVAL; + goto out; } kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg); From 02df00eb0019e7d15a1fcddebe4d020226c1ccda Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 10 Jun 2014 14:06:25 +0200 Subject: [PATCH 032/455] nl80211: move set_qos_map command into split state The non-split wiphy state shouldn't be increased in size so move the new set_qos_map command into the split if statement. Cc: stable@vger.kernel.org (3.14+) Fixes: fa9ffc745610 ("cfg80211: Add support for QoS mapping") Reviewed-by: Emmanuel Grumbach Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index ba4f1723c83a..6668daf69326 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -1497,18 +1497,17 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev, } CMD(start_p2p_device, START_P2P_DEVICE); CMD(set_mcast_rate, SET_MCAST_RATE); +#ifdef CONFIG_NL80211_TESTMODE + CMD(testmode_cmd, TESTMODE); +#endif if (state->split) { CMD(crit_proto_start, CRIT_PROTOCOL_START); CMD(crit_proto_stop, CRIT_PROTOCOL_STOP); if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH) CMD(channel_switch, CHANNEL_SWITCH); + CMD(set_qos_map, SET_QOS_MAP); } - CMD(set_qos_map, SET_QOS_MAP); - -#ifdef CONFIG_NL80211_TESTMODE - CMD(testmode_cmd, TESTMODE); -#endif - + /* add into the if now */ #undef CMD if (rdev->ops->connect || rdev->ops->auth) { From b3c063ae7279981f7161e63b44f214c62f122b32 Mon Sep 17 00:00:00 2001 From: Oren Givon Date: Sun, 25 May 2014 16:31:58 +0300 Subject: [PATCH 033/455] iwlwifi: update the 7265 series HW IDs Add one more 7265 series HW ID. Edit one existing 7265 series HW ID. CC: [3.13+] Signed-off-by: Oren Givon Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/pcie/drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/iwlwifi/pcie/drv.c b/drivers/net/wireless/iwlwifi/pcie/drv.c index 7091a18d5a72..98950e45c7b0 100644 --- a/drivers/net/wireless/iwlwifi/pcie/drv.c +++ b/drivers/net/wireless/iwlwifi/pcie/drv.c @@ -367,6 +367,7 @@ static DEFINE_PCI_DEVICE_TABLE(iwl_hw_card_ids) = { {IWL_PCI_DEVICE(0x095A, 0x5012, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x5412, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x5410, iwl7265_2ac_cfg)}, + {IWL_PCI_DEVICE(0x095A, 0x5510, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x5400, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x1010, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x5000, iwl7265_2n_cfg)}, @@ -380,7 +381,7 @@ static DEFINE_PCI_DEVICE_TABLE(iwl_hw_card_ids) = { {IWL_PCI_DEVICE(0x095A, 0x9110, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x9112, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x9210, iwl7265_2ac_cfg)}, - {IWL_PCI_DEVICE(0x095A, 0x9200, iwl7265_2ac_cfg)}, + {IWL_PCI_DEVICE(0x095B, 0x9200, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x9510, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x9310, iwl7265_2ac_cfg)}, {IWL_PCI_DEVICE(0x095A, 0x9410, iwl7265_2ac_cfg)}, From a42c9fcc4a88cdd246fab3bcf06c4487afee3d88 Mon Sep 17 00:00:00 2001 From: Arik Nemtsov Date: Thu, 8 May 2014 16:17:31 +0300 Subject: [PATCH 034/455] Revert "iwlwifi: remove IWL_UCODE_TLV_FLAGS_UAPSD_SUPPORT flag" This reverts commit dc9a19296a872644f19a06d8eeb5db222d327b41. 3610 cards don't support UAPSD. Signed-off-by: Arik Nemtsov Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/iwl-fw.h | 1 + drivers/net/wireless/iwlwifi/mvm/mac80211.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/drivers/net/wireless/iwlwifi/iwl-fw.h b/drivers/net/wireless/iwlwifi/iwl-fw.h index 0aa7c0085c9f..b1a33322b9ba 100644 --- a/drivers/net/wireless/iwlwifi/iwl-fw.h +++ b/drivers/net/wireless/iwlwifi/iwl-fw.h @@ -88,6 +88,7 @@ * P2P client interfaces simultaneously if they are in different bindings. * @IWL_UCODE_TLV_FLAGS_P2P_BSS_PS_SCM: support power save on BSS station and * P2P client interfaces simultaneously if they are in same bindings. + * @IWL_UCODE_TLV_FLAGS_UAPSD_SUPPORT: General support for uAPSD * @IWL_UCODE_TLV_FLAGS_P2P_PS_UAPSD: P2P client supports uAPSD power save * @IWL_UCODE_TLV_FLAGS_BCAST_FILTERING: uCode supports broadcast filtering. * @IWL_UCODE_TLV_FLAGS_GO_UAPSD: AP/GO interfaces support uAPSD clients diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c index 7215f5980186..1cef708cb74d 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c @@ -303,6 +303,13 @@ int iwl_mvm_mac_setup_register(struct iwl_mvm *mvm) hw->uapsd_max_sp_len = IWL_UAPSD_MAX_SP; } + if (mvm->fw->ucode_capa.flags & IWL_UCODE_TLV_FLAGS_UAPSD_SUPPORT && + !iwlwifi_mod_params.uapsd_disable) { + hw->flags |= IEEE80211_HW_SUPPORTS_UAPSD; + hw->uapsd_queues = IWL_UAPSD_AC_INFO; + hw->uapsd_max_sp_len = IWL_UAPSD_MAX_SP; + } + hw->sta_data_size = sizeof(struct iwl_mvm_sta); hw->vif_data_size = sizeof(struct iwl_mvm_vif); hw->chanctx_data_size = sizeof(u16); From e48393e8cf99f2b070b0a1c3d79411ccddcba2df Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Thu, 22 May 2014 11:19:02 +0300 Subject: [PATCH 035/455] iwlwifi: mvm: Fix broadcast filtering Current code did not allow sending the broadcast filtering command for P2P Client interfaces. However, this was not enough, since once broadcast filtering command was issued over the station interface after the P2P Client connected, the command also attached the filters to the P2P Client MAC which is not allowed (FW ASSERT 1063). Fix this skipping P2P Client interfaces when constructing the broadcast filtering command Signed-off-by: Ilan Peer Reviewed-by: ArikX Nemtsov Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/mvm/mac80211.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c index 1cef708cb74d..9bfb90680cdc 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c @@ -1166,8 +1166,12 @@ static void iwl_mvm_bcast_filter_iterator(void *_data, u8 *mac, bcast_mac = &cmd->macs[mvmvif->id]; - /* enable filtering only for associated stations */ - if (vif->type != NL80211_IFTYPE_STATION || !vif->bss_conf.assoc) + /* + * enable filtering only for associated stations, but not for P2P + * Clients + */ + if (vif->type != NL80211_IFTYPE_STATION || vif->p2p || + !vif->bss_conf.assoc) return; bcast_mac->default_discard = 1; @@ -1244,10 +1248,6 @@ static int iwl_mvm_configure_bcast_filter(struct iwl_mvm *mvm, if (!(mvm->fw->ucode_capa.flags & IWL_UCODE_TLV_FLAGS_BCAST_FILTERING)) return 0; - /* bcast filtering isn't supported for P2P client */ - if (vif->p2p) - return 0; - if (!iwl_mvm_bcast_filter_build_cmd(mvm, &cmd)) return 0; From 341acbb3aabbcfbf069d7de4ad35f51b58176faf Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Mon, 16 Jun 2014 00:17:07 +0530 Subject: [PATCH 036/455] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: Aneesh Kumar K.V Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +++++++++++++++++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 7 ++----- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index fddb72b48ce9..d645428a65a4 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -198,8 +198,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, + bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1); @@ -214,14 +216,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul << mmu_psize_defs[size].shift; return 1ul << mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 80561074078d..68468d695f12 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1562,7 +1562,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup && is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 6e6224318c36..5a24d3c2b6b8 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -814,13 +814,10 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, r = hpte[i+1]; /* - * Check the HPTE again, including large page size - * Since we don't currently allow any MPSS (mixed - * page-size segment) page sizes, it is sufficient - * to check against the actual page size. + * Check the HPTE again, including base page size */ if ((v & valid) && (v & mask) == val && - hpte_page_size(v, r) == (1ul << pshift)) + hpte_base_page_size(v, r) == (1ul << pshift)) /* Return with the HPTE still locked */ return (hash << 3) + (i >> 1); From d76744a93246eccdca1106037e8ee29debf48277 Mon Sep 17 00:00:00 2001 From: Amitkumar Karwar Date: Fri, 20 Jun 2014 11:45:25 -0700 Subject: [PATCH 037/455] mwifiex: fix Tx timeout issue https://bugzilla.kernel.org/show_bug.cgi?id=70191 https://bugzilla.kernel.org/show_bug.cgi?id=77581 It is observed that sometimes Tx packet is downloaded without adding driver's txpd header. This results in firmware parsing garbage data as packet length. Sometimes firmware is unable to read the packet if length comes out as invalid. This stops further traffic and timeout occurs. The root cause is uninitialized fields in tx_info(skb->cb) of packet used to get garbage values. In this case if MWIFIEX_BUF_FLAG_REQUEUED_PKT flag is mistakenly set, txpd header was skipped. This patch makes sure that tx_info is correctly initialized to fix the problem. Cc: Reported-by: Andrew Wiley Reported-by: Linus Gasser Reported-by: Michael Hirsch Tested-by: Xinming Hu Signed-off-by: Amitkumar Karwar Signed-off-by: Maithili Hinge Signed-off-by: Avinash Patil Signed-off-by: Bing Zhao Signed-off-by: John W. Linville --- drivers/net/wireless/mwifiex/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c index cbabc12fbda3..e91cd0fa5ca8 100644 --- a/drivers/net/wireless/mwifiex/main.c +++ b/drivers/net/wireless/mwifiex/main.c @@ -645,6 +645,7 @@ mwifiex_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) } tx_info = MWIFIEX_SKB_TXCB(skb); + memset(tx_info, 0, sizeof(*tx_info)); tx_info->bss_num = priv->bss_num; tx_info->bss_type = priv->bss_type; tx_info->pkt_len = skb->len; From c6bff5449fa78e1d524bb5b67b8ab82faf835bff Mon Sep 17 00:00:00 2001 From: Arend van Spriel Date: Fri, 20 Jun 2014 22:19:25 +0200 Subject: [PATCH 038/455] brcmfmac: assign chip id and rev in bus interface after brcmf_usb_dlneeded The function brcmf_usb_dlneeded() queries the device to obtain the chip id and revision. So assigning these in bus interface before the call resulted in chip id and revision being zero. This was introduced by: commit 5b8045d484d0ef77d6aa9444023220c5671fa3fe Author: Arend van Spriel Date: Tue May 27 12:56:23 2014 +0200 brcmfmac: use asynchronous firmware request in USB Reviewed-by: Hante Meuleman Reviewed-by: Pieter-Paul Giesberts Signed-off-by: Arend van Spriel Signed-off-by: John W. Linville --- drivers/net/wireless/brcm80211/brcmfmac/usb.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/brcm80211/brcmfmac/usb.c index 6db51a666f61..d06fcb05adf2 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/usb.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/usb.c @@ -1184,8 +1184,6 @@ static int brcmf_usb_probe_cb(struct brcmf_usbdev_info *devinfo) bus->bus_priv.usb = bus_pub; dev_set_drvdata(dev, bus); bus->ops = &brcmf_usb_bus_ops; - bus->chip = bus_pub->devid; - bus->chiprev = bus_pub->chiprev; bus->proto_type = BRCMF_PROTO_BCDC; bus->always_use_fws_queue = true; @@ -1194,6 +1192,9 @@ static int brcmf_usb_probe_cb(struct brcmf_usbdev_info *devinfo) if (ret) goto fail; } + bus->chip = bus_pub->devid; + bus->chiprev = bus_pub->chiprev; + /* request firmware here */ brcmf_fw_get_firmwares(dev, 0, brcmf_usb_get_fwname(devinfo), NULL, brcmf_usb_probe_phase2); From b7eea4545ea775df957460f58eb56085a8892856 Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Wed, 18 Jun 2014 12:34:21 +0200 Subject: [PATCH 039/455] xfrm: Fix refcount imbalance in xfrm_lookup xfrm_lookup must return a dst_entry with a refcount for the caller. Git commit 1a1ccc96abb ("xfrm: Remove caching of xfrm_policy_sk_bundles") removed this refcount for the socket policy case accidentally. This patch restores it and sets DST_NOCACHE flag to make sure that the dst_entry is freed when the refcount becomes null. Fixes: 1a1ccc96abb ("xfrm: Remove caching of xfrm_policy_sk_bundles") Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_policy.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index a8ef5108e0d8..0525d78ba328 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2097,6 +2097,8 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig, goto no_transform; } + dst_hold(&xdst->u.dst); + xdst->u.dst.flags |= DST_NOCACHE; route = xdst->route; } } From 9715a2e8515217206ebf53040c979fdbeb805a21 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Thu, 26 Jun 2014 13:19:40 +0200 Subject: [PATCH 040/455] PPC: Add _GLOBAL_TOC for 32bit Commit ac5a8ee8 started using _GLOBAL_TOC on ppc32 code. Unfortunately it's only defined for 64bit targets though. Define it for ppc32 as well, fixing the build breakage that commit introduced. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/ppc_asm.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 9ea266eae33e..7e4612528546 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -277,6 +277,8 @@ n: .globl n; \ n: +#define _GLOBAL_TOC(name) _GLOBAL(name) + #define _KPROBE(n) \ .section ".kprobes.text","a"; \ .globl n; \ From 73413ffac3b713231dac466bca216f970042c5e5 Mon Sep 17 00:00:00 2001 From: Yijing Wang Date: Wed, 25 Jun 2014 12:22:56 +0800 Subject: [PATCH 041/455] bnx2x: Fix the MSI flags MSI-X should use PCI_MSIX_FLAGS not PCI_MSI_FLAGS. Signed-off-by: Yijing Wang Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 2887034523e0..6a8b1453a1b9 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -12937,7 +12937,7 @@ static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev, int cnic_cnt) * without the default SB. * For VFs there is no default SB, then we return (index+1). */ - pci_read_config_word(pdev, pdev->msix_cap + PCI_MSI_FLAGS, &control); + pci_read_config_word(pdev, pdev->msix_cap + PCI_MSIX_FLAGS, &control); index = control & PCI_MSIX_FLAGS_QSIZE; From 3e215c8d1b6b772d107f1811b5ee8eae7a046fb4 Mon Sep 17 00:00:00 2001 From: James M Leddy Date: Wed, 25 Jun 2014 17:38:13 -0400 Subject: [PATCH 042/455] udp: Add MIB counters for rcvbuferrors Add MIB counters for rcvbuferrors in UDP to help diagnose problems. Signed-off-by: James M Leddy Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/udp.c | 5 ++++- net/ipv6/udp.c | 6 +++++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d92f94b7e402..7d5a8661df76 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1588,8 +1588,11 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) goto csum_error; - if (sk_rcvqueues_full(sk, skb, sk->sk_rcvbuf)) + if (sk_rcvqueues_full(sk, skb, sk->sk_rcvbuf)) { + UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_RCVBUFERRORS, + is_udplite); goto drop; + } rc = 0; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 95c834799288..7092ff78fd84 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -674,8 +674,11 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) goto csum_error; } - if (sk_rcvqueues_full(sk, skb, sk->sk_rcvbuf)) + if (sk_rcvqueues_full(sk, skb, sk->sk_rcvbuf)) { + UDP6_INC_STATS_BH(sock_net(sk), + UDP_MIB_RCVBUFERRORS, is_udplite); goto drop; + } skb_dst_drop(skb); @@ -690,6 +693,7 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) bh_unlock_sock(sk); return rc; + csum_error: UDP6_INC_STATS_BH(sock_net(sk), UDP_MIB_CSUMERRORS, is_udplite); drop: From e940f5d6ba6a01f8dbb870854d5205d322452730 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Fri, 27 Jun 2014 09:57:53 +0800 Subject: [PATCH 043/455] ipv6: Fix MLD Query message check Based on RFC3810 6.2, we also need to check the hop limit and router alert option besides source address. Signed-off-by: Hangbin Liu Acked-by: YOSHIFUJI Hideaki Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- net/ipv6/mcast.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 08b367c6b9cf..617f0958e164 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1301,8 +1301,17 @@ int igmp6_event_query(struct sk_buff *skb) len = ntohs(ipv6_hdr(skb)->payload_len) + sizeof(struct ipv6hdr); len -= skb_network_header_len(skb); - /* Drop queries with not link local source */ - if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) + /* RFC3810 6.2 + * Upon reception of an MLD message that contains a Query, the node + * checks if the source address of the message is a valid link-local + * address, if the Hop Limit is set to 1, and if the Router Alert + * option is present in the Hop-By-Hop Options header of the IPv6 + * packet. If any of these checks fails, the packet is dropped. + */ + if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL) || + ipv6_hdr(skb)->hop_limit != 1 || + !(IP6CB(skb)->flags & IP6SKB_ROUTERALERT) || + IP6CB(skb)->ra != htons(IPV6_OPT_ROUTERALERT_MLD)) return -EINVAL; idev = __in6_dev_get(skb->dev); From 3fc60aa097b8eb0f701c5bf755bc8f7d3ffeb0bd Mon Sep 17 00:00:00 2001 From: Denis Kirjanov Date: Wed, 25 Jun 2014 21:34:56 +0400 Subject: [PATCH 044/455] powerpc: bpf: Use correct mask while accessing the VLAN tag To get a full tag (and not just a VID) we should access the TCI except the VLAN_TAG_PRESENT field (which means that 802.1q header is present). Also ensure that the VLAN_TAG_PRESENT stay on its place Signed-off-by: Denis Kirjanov Signed-off-by: David S. Miller --- arch/powerpc/net/bpf_jit_comp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 6dcdadefd8d0..892167b0a4bc 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -390,10 +390,12 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image, case BPF_ANC | SKF_AD_VLAN_TAG: case BPF_ANC | SKF_AD_VLAN_TAG_PRESENT: BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2); + BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000); + PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, vlan_tci)); if (code == (BPF_ANC | SKF_AD_VLAN_TAG)) - PPC_ANDI(r_A, r_A, VLAN_VID_MASK); + PPC_ANDI(r_A, r_A, ~VLAN_TAG_PRESENT); else PPC_ANDI(r_A, r_A, VLAN_TAG_PRESENT); break; From dba63115ce0c888fcb4cdec3f8a4ba97d144afaf Mon Sep 17 00:00:00 2001 From: Denis Kirjanov Date: Wed, 25 Jun 2014 21:34:57 +0400 Subject: [PATCH 045/455] powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test We have to return the boolean here if the tag presents or not, not just ANDing the TCI with the mask which results to: [ 709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT [ 709.412245] ret 4096 != 1 [ 709.412332] ret 4096 != 1 [ 709.412333] FAIL (2 times) Signed-off-by: Denis Kirjanov Signed-off-by: David S. Miller --- arch/powerpc/net/bpf_jit_comp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 892167b0a4bc..82e82cadcde5 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -394,10 +394,12 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image, PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, vlan_tci)); - if (code == (BPF_ANC | SKF_AD_VLAN_TAG)) + if (code == (BPF_ANC | SKF_AD_VLAN_TAG)) { PPC_ANDI(r_A, r_A, ~VLAN_TAG_PRESENT); - else + } else { PPC_ANDI(r_A, r_A, VLAN_TAG_PRESENT); + PPC_SRWI(r_A, r_A, 12); + } break; case BPF_ANC | SKF_AD_QUEUE: BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, From fe984c08e20f0fc2b4666bf8eeeb02605568387b Mon Sep 17 00:00:00 2001 From: Andy Zhou Date: Tue, 6 May 2014 17:23:48 -0700 Subject: [PATCH 046/455] openvswitch: Fix a double free bug for the sample action When sample action returns with an error, the skb has already been freed. This patch fix a bug to make sure we don't free it again. This bug introduced by commit ccb1352e76cff05 (net: Add Open vSwitch kernel components.) Signed-off-by: Andy Zhou Signed-off-by: Pravin B Shelar --- net/openvswitch/actions.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index c36856a457ca..e70d8b18e962 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -551,6 +551,8 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, case OVS_ACTION_ATTR_SAMPLE: err = sample(dp, skb, a); + if (unlikely(err)) /* skb already freed. */ + return err; break; } From e0bb8c44ed5cfcc56b571758ed966ee48779024c Mon Sep 17 00:00:00 2001 From: Wei Zhang Date: Sat, 28 Jun 2014 12:34:53 -0700 Subject: [PATCH 047/455] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash When use gre vport, openvswitch register a gre_cisco_protocol but does not supply a err_handler with it. The gre_cisco_err() in net/ipv4/gre_demux.c expect err_handler be provided with the gre_cisco_protocol implementation, and call ->err_handler() without existence check, cause the kernel crash. This patch provide a err_handler to fix this bug. This bug introduced by commit aa310701e787087d (openvswitch: Add gre tunnel support.) Signed-off-by: Wei Zhang Signed-off-by: Jesse Gross Signed-off-by: Pravin B Shelar --- net/openvswitch/vport-gre.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c index 35ec4fed09e2..f49148a07da2 100644 --- a/net/openvswitch/vport-gre.c +++ b/net/openvswitch/vport-gre.c @@ -110,6 +110,22 @@ static int gre_rcv(struct sk_buff *skb, return PACKET_RCVD; } +/* Called with rcu_read_lock and BH disabled. */ +static int gre_err(struct sk_buff *skb, u32 info, + const struct tnl_ptk_info *tpi) +{ + struct ovs_net *ovs_net; + struct vport *vport; + + ovs_net = net_generic(dev_net(skb->dev), ovs_net_id); + vport = rcu_dereference(ovs_net->vport_net.gre_vport); + + if (unlikely(!vport)) + return PACKET_REJECT; + else + return PACKET_RCVD; +} + static int gre_tnl_send(struct vport *vport, struct sk_buff *skb) { struct net *net = ovs_dp_get_net(vport->dp); @@ -186,6 +202,7 @@ error: static struct gre_cisco_protocol gre_protocol = { .handler = gre_rcv, + .err_handler = gre_err, .priority = 1, }; From ad55200734c65a3ec5d0c39d6ea904008baea536 Mon Sep 17 00:00:00 2001 From: Ben Pfaff Date: Tue, 6 May 2014 16:48:38 -0700 Subject: [PATCH 048/455] openvswitch: Fix tracking of flags seen in TCP flows. Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the flow found in the kernel flow table (in 'flow'). This bug made the Open vSwitch userspace fin_timeout action have no effect in many cases. This bug is introduced by commit 88d73f6c411ac2f0578 (openvswitch: Use TCP flags in the flow key for stats.) Reported-by: Len Gao Signed-off-by: Ben Pfaff Acked-by: Jarno Rajahalme Acked-by: Jesse Gross Signed-off-by: Pravin B Shelar --- net/openvswitch/datapath.c | 4 ++-- net/openvswitch/flow.c | 4 ++-- net/openvswitch/flow.h | 5 +++-- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 0d407bca81e3..a863678c50ac 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2007-2013 Nicira, Inc. + * Copyright (c) 2007-2014 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -276,7 +276,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) OVS_CB(skb)->flow = flow; OVS_CB(skb)->pkt_key = &key; - ovs_flow_stats_update(OVS_CB(skb)->flow, skb); + ovs_flow_stats_update(OVS_CB(skb)->flow, key.tp.flags, skb); ovs_execute_actions(dp, skb); stats_counter = &stats->n_hit; diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 334751cb1528..d07ab538fc9d 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -61,10 +61,10 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies) #define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & htons(0x0FFF)) -void ovs_flow_stats_update(struct sw_flow *flow, struct sk_buff *skb) +void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags, + struct sk_buff *skb) { struct flow_stats *stats; - __be16 tcp_flags = flow->key.tp.flags; int node = numa_node_id(); stats = rcu_dereference(flow->stats[node]); diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h index ac395d2cd821..5e5aaed3a85b 100644 --- a/net/openvswitch/flow.h +++ b/net/openvswitch/flow.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2007-2013 Nicira, Inc. + * Copyright (c) 2007-2014 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -180,7 +180,8 @@ struct arp_eth_header { unsigned char ar_tip[4]; /* target IP address */ } __packed; -void ovs_flow_stats_update(struct sw_flow *, struct sk_buff *); +void ovs_flow_stats_update(struct sw_flow *, __be16 tcp_flags, + struct sk_buff *); void ovs_flow_stats_get(const struct sw_flow *, struct ovs_flow_stats *, unsigned long *used, __be16 *tcp_flags); void ovs_flow_stats_clear(struct sw_flow *); From a0e5ef53aac8e5049f9344857d8ec5237d31e58b Mon Sep 17 00:00:00 2001 From: Tobias Brunner Date: Thu, 26 Jun 2014 15:12:45 +0200 Subject: [PATCH 049/455] xfrm: Fix installation of AH IPsec SAs The SPI check introduced in ea9884b3acf3311c8a11db67bfab21773f6f82ba was intended for IPComp SAs but actually prevented AH SAs from getting installed (depending on the SPI). Fixes: ea9884b3acf3 ("xfrm: check user specified spi for IPComp") Cc: Fan Du Signed-off-by: Tobias Brunner Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_user.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 412d9dc3a873..d4db6ebb089d 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -177,9 +177,7 @@ static int verify_newsa_info(struct xfrm_usersa_info *p, attrs[XFRMA_ALG_AEAD] || attrs[XFRMA_ALG_CRYPT] || attrs[XFRMA_ALG_COMP] || - attrs[XFRMA_TFCPAD] || - (ntohl(p->id.spi) >= 0x10000)) - + attrs[XFRMA_TFCPAD]) goto out; break; @@ -207,7 +205,8 @@ static int verify_newsa_info(struct xfrm_usersa_info *p, attrs[XFRMA_ALG_AUTH] || attrs[XFRMA_ALG_AUTH_TRUNC] || attrs[XFRMA_ALG_CRYPT] || - attrs[XFRMA_TFCPAD]) + attrs[XFRMA_TFCPAD] || + (ntohl(p->id.spi) >= 0x10000)) goto out; break; From 63283dd21ed2bf25a71909a820ed3e8fe412e15d Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Fri, 27 Jun 2014 18:51:39 +0200 Subject: [PATCH 050/455] netfilter: nf_tables: skip transaction if no update flags in tables Skip transaction handling for table updates with no changes in the flags. This fixes a crash when passing the table flag with all bits unset. Reported-by: Ana Rey Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index ab4566cfcbe4..da5dc37a7402 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -407,6 +407,9 @@ static int nf_tables_updtable(struct nft_ctx *ctx) if (flags & ~NFT_TABLE_F_DORMANT) return -EINVAL; + if (flags == ctx->table->flags) + return 0; + trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE, sizeof(struct nft_trans_table)); if (trans == NULL) From 4a46b24e147dfa9b858026da02cad0bdd4e149d2 Mon Sep 17 00:00:00 2001 From: Alex Wang Date: Mon, 30 Jun 2014 20:30:29 -0700 Subject: [PATCH 051/455] openvswitch: Use exact lookup for flow_get and flow_del. Due to the race condition in userspace, there is chance that two overlapping megaflows could be installed in datapath. And this causes userspace unable to delete the less inclusive megaflow flow even after it timeout, since the flow_del logic will stop at the first match of masked flow. This commit fixes the bug by making the kernel flow_del and flow_get logic check all masks in that case. Introduced by 03f0d916a (openvswitch: Mega flow implementation). Signed-off-by: Alex Wang Acked-by: Andy Zhou Signed-off-by: Pravin B Shelar --- net/openvswitch/datapath.c | 23 +++++++++++------------ net/openvswitch/flow_table.c | 16 ++++++++++++++++ net/openvswitch/flow_table.h | 3 ++- 3 files changed, 29 insertions(+), 13 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index a863678c50ac..9db4bf6740d1 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -889,8 +889,11 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) } /* The unmasked key has to be the same for flow updates. */ if (unlikely(!ovs_flow_cmp_unmasked_key(flow, &match))) { - error = -EEXIST; - goto err_unlock_ovs; + flow = ovs_flow_tbl_lookup_exact(&dp->table, &match); + if (!flow) { + error = -ENOENT; + goto err_unlock_ovs; + } } /* Update actions. */ old_acts = ovsl_dereference(flow->sf_acts); @@ -981,16 +984,12 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info) goto err_unlock_ovs; } /* Check that the flow exists. */ - flow = ovs_flow_tbl_lookup(&dp->table, &key); + flow = ovs_flow_tbl_lookup_exact(&dp->table, &match); if (unlikely(!flow)) { error = -ENOENT; goto err_unlock_ovs; } - /* The unmasked key has to be the same for flow updates. */ - if (unlikely(!ovs_flow_cmp_unmasked_key(flow, &match))) { - error = -EEXIST; - goto err_unlock_ovs; - } + /* Update actions, if present. */ if (likely(acts)) { old_acts = ovsl_dereference(flow->sf_acts); @@ -1063,8 +1062,8 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info) goto unlock; } - flow = ovs_flow_tbl_lookup(&dp->table, &key); - if (!flow || !ovs_flow_cmp_unmasked_key(flow, &match)) { + flow = ovs_flow_tbl_lookup_exact(&dp->table, &match); + if (!flow) { err = -ENOENT; goto unlock; } @@ -1113,8 +1112,8 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) goto unlock; } - flow = ovs_flow_tbl_lookup(&dp->table, &key); - if (unlikely(!flow || !ovs_flow_cmp_unmasked_key(flow, &match))) { + flow = ovs_flow_tbl_lookup_exact(&dp->table, &match); + if (unlikely(!flow)) { err = -ENOENT; goto unlock; } diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 574c3abc9b30..cf2d853646f0 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -456,6 +456,22 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl, return ovs_flow_tbl_lookup_stats(tbl, key, &n_mask_hit); } +struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl, + struct sw_flow_match *match) +{ + struct table_instance *ti = rcu_dereference_ovsl(tbl->ti); + struct sw_flow_mask *mask; + struct sw_flow *flow; + + /* Always called under ovs-mutex. */ + list_for_each_entry(mask, &tbl->mask_list, list) { + flow = masked_flow_lookup(ti, match->key, mask); + if (flow && ovs_flow_cmp_unmasked_key(flow, match)) /* Found */ + return flow; + } + return NULL; +} + int ovs_flow_tbl_num_masks(const struct flow_table *table) { struct sw_flow_mask *mask; diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h index ca8a5820f615..5918bff7f3f6 100644 --- a/net/openvswitch/flow_table.h +++ b/net/openvswitch/flow_table.h @@ -76,7 +76,8 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *, u32 *n_mask_hit); struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *, const struct sw_flow_key *); - +struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl, + struct sw_flow_match *match); bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow, struct sw_flow_match *match); From e9110361a9a4e258b072b14bd44eb78cf11453cb Mon Sep 17 00:00:00 2001 From: Heiko Schocher Date: Tue, 24 Jun 2014 09:25:18 +0200 Subject: [PATCH 052/455] UBI: fix the volumes tree sorting criteria Commig "604b592 UBI: fix rb_tree node comparison in add_map" broke fastmap backward compatibility and older fastmap images cannot be mounted anymore. The reason is that it changes the volumes RB-tree sorting criteria. This patch fixes the problem. Artem: re-write the commit message Signed-off-by: Heiko Schocher Acked-by: Richard Weinberger Signed-off-by: Artem Bityutskiy --- drivers/mtd/ubi/fastmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index b04e7d059888..72f39daee649 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -125,7 +125,7 @@ static struct ubi_ainf_volume *add_vol(struct ubi_attach_info *ai, int vol_id, parent = *p; av = rb_entry(parent, struct ubi_ainf_volume, rb); - if (vol_id < av->vol_id) + if (vol_id > av->vol_id) p = &(*p)->rb_left; else p = &(*p)->rb_right; From 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 30 Jun 2014 01:26:23 -0700 Subject: [PATCH 053/455] ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix We have two different ways to handle changes to sk->sk_dst First way (used by TCP) assumes socket lock is owned by caller, and use no extra lock : __sk_dst_set() & __sk_dst_reset() Another way (used by UDP) uses sk_dst_lock because socket lock is not always taken. Note that sk_dst_lock is not softirq safe. These ways are not inter changeable for a given socket type. ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used the socket lock as synchronization, but users might be UDP sockets. Instead of converting sk_dst_lock to a softirq safe version, use xchg() as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use sk_dst_lock from softirq context") In a follow up patch, we probably can remove sk_dst_lock, as it is only used in IPv6. Signed-off-by: Eric Dumazet Cc: Steffen Klassert Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible") Signed-off-by: David S. Miller --- include/net/sock.h | 12 ++++++------ net/ipv4/route.c | 15 ++++++++------- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 173cae485de1..c556fd9b05ac 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1768,9 +1768,11 @@ __sk_dst_set(struct sock *sk, struct dst_entry *dst) static inline void sk_dst_set(struct sock *sk, struct dst_entry *dst) { - spin_lock(&sk->sk_dst_lock); - __sk_dst_set(sk, dst); - spin_unlock(&sk->sk_dst_lock); + struct dst_entry *old_dst; + + sk_tx_queue_clear(sk); + old_dst = xchg(&sk->sk_dst_cache, dst); + dst_release(old_dst); } static inline void @@ -1782,9 +1784,7 @@ __sk_dst_reset(struct sock *sk) static inline void sk_dst_reset(struct sock *sk) { - spin_lock(&sk->sk_dst_lock); - __sk_dst_reset(sk); - spin_unlock(&sk->sk_dst_lock); + sk_dst_set(sk, NULL); } struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 082239ffe34a..3162ea923ded 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1010,7 +1010,7 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) const struct iphdr *iph = (const struct iphdr *) skb->data; struct flowi4 fl4; struct rtable *rt; - struct dst_entry *dst; + struct dst_entry *odst = NULL; bool new = false; bh_lock_sock(sk); @@ -1018,16 +1018,17 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) if (!ip_sk_accept_pmtu(sk)) goto out; - rt = (struct rtable *) __sk_dst_get(sk); + odst = sk_dst_get(sk); - if (sock_owned_by_user(sk) || !rt) { + if (sock_owned_by_user(sk) || !odst) { __ipv4_sk_update_pmtu(skb, sk, mtu); goto out; } __build_flow_key(&fl4, sk, iph, 0, 0, 0, 0, 0); - if (!__sk_dst_check(sk, 0)) { + rt = (struct rtable *)odst; + if (odst->obsolete && odst->ops->check(odst, 0) == NULL) { rt = ip_route_output_flow(sock_net(sk), &fl4, sk); if (IS_ERR(rt)) goto out; @@ -1037,8 +1038,7 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) __ip_rt_update_pmtu((struct rtable *) rt->dst.path, &fl4, mtu); - dst = dst_check(&rt->dst, 0); - if (!dst) { + if (!dst_check(&rt->dst, 0)) { if (new) dst_release(&rt->dst); @@ -1050,10 +1050,11 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) } if (new) - __sk_dst_set(sk, &rt->dst); + sk_dst_set(sk, &rt->dst); out: bh_unlock_sock(sk); + dst_release(odst); } EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu); From 0e2be4c1121ae3dc2771c4d9b99d4c39ea9577d8 Mon Sep 17 00:00:00 2001 From: Thomas Petazzoni Date: Tue, 1 Jul 2014 17:23:06 +0200 Subject: [PATCH 054/455] ARM: mvebu: fix SMP boot for Armada 38x and Armada 375 Z1 in big endian The SMP boot on Armada 38x and Armada 375 Z1 is currently broken in big-endian configurations, and this commit fixes it for both platforms. For Armada 375 Z1, the problem was in the armada_375_smp_cpu1_enable_code part of the code that gets copied to the Crypto SRAM as a work-around for an issue of the Z1 stepping. This piece of code was not switching the CPU core to big-endian, and not endian-swapping the value read from the Resume Address register (the value is stored little-endian). Due to the introduction of the conditional 'rev r1, r1' instruction, the offset between the 'ldr r0, [pc, #4]' instruction and the value it was looking is different between LE and BE configurations. To solve this, we instead use one 'adr' instruction followed by one 'ldr'. For Armada 38x, the problem was simply that the CPU core was not switched to big endian in the secondary CPU startup function. This change was tested in LE and BE configurations on Armada 385, Armada 375 Z1 and Armada 375 A0. Signed-off-by: Thomas Petazzoni Link: https://lkml.kernel.org/r/1404228186-21203-1-git-send-email-thomas.petazzoni@free-electrons.com Signed-off-by: Jason Cooper --- arch/arm/mach-mvebu/headsmp-a9.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-mvebu/headsmp-a9.S b/arch/arm/mach-mvebu/headsmp-a9.S index 5925366bc03c..da5bb292b91c 100644 --- a/arch/arm/mach-mvebu/headsmp-a9.S +++ b/arch/arm/mach-mvebu/headsmp-a9.S @@ -15,6 +15,8 @@ #include #include +#include + __CPUINIT #define CPU_RESUME_ADDR_REG 0xf10182d4 @@ -22,13 +24,18 @@ .global armada_375_smp_cpu1_enable_code_end armada_375_smp_cpu1_enable_code_start: - ldr r0, [pc, #4] +ARM_BE8(setend be) + adr r0, 1f + ldr r0, [r0] ldr r1, [r0] +ARM_BE8(rev r1, r1) mov pc, r1 +1: .word CPU_RESUME_ADDR_REG armada_375_smp_cpu1_enable_code_end: ENTRY(mvebu_cortex_a9_secondary_startup) +ARM_BE8(setend be) bl v7_invalidate_l1 b secondary_startup ENDPROC(mvebu_cortex_a9_secondary_startup) From 07b0f00964def8af9321cfd6c4a7e84f6362f728 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 26 Jun 2014 00:44:02 -0700 Subject: [PATCH 055/455] bnx2x: fix possible panic under memory stress While it is legal to kfree(NULL), it is not wise to use : put_page(virt_to_head_page(NULL)) BUG: unable to handle kernel paging request at ffffeba400000000 IP: [] virt_to_head_page+0x36/0x44 [bnx2x] Reported-by: Michel Lespinasse Signed-off-by: Eric Dumazet Cc: Ariel Elior Fixes: d46d132cc021 ("bnx2x: use netdev_alloc_frag()") Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 47c5814114e1..4b875da1c7ed 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -797,7 +797,8 @@ static void bnx2x_tpa_stop(struct bnx2x *bp, struct bnx2x_fastpath *fp, return; } - bnx2x_frag_free(fp, new_data); + if (new_data) + bnx2x_frag_free(fp, new_data); drop: /* drop the packet and keep the buffer in the bin */ DP(NETIF_MSG_RX_STATUS, From 3b140a678834f55602e50e7d6a47663e230434a7 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 26 Jun 2014 10:06:44 -0700 Subject: [PATCH 056/455] net: systemport: do not clear IFF_MULTICAST flag The SYSTEMPORT Ethernet MAC supports multicast just fine, it just lacks any sort of Unicast/Broadcast/Multicasting filtering at the Ethernet MAC level since that is handled by the front end Ethernet switch, but that is properly handled by bcm_sysport_set_rx_mode(). Some user-space applications might be relying on the presence of this flag to prevent using multicast sockets, this also prevents that interface from joining the IPv6 all-router mcast group. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bcmsysport.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index 141160ef249a..f6bccd847ee6 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1589,12 +1589,6 @@ static int bcm_sysport_probe(struct platform_device *pdev) BUILD_BUG_ON(sizeof(struct bcm_tsb) != 8); dev->needed_headroom += sizeof(struct bcm_tsb); - /* We are interfaced to a switch which handles the multicast - * filtering for us, so we do not support programming any - * multicast hash table in this Ethernet MAC. - */ - dev->flags &= ~IFF_MULTICAST; - /* libphy will adjust the link state accordingly */ netif_carrier_off(dev); From 412bce83ac786197d0b2805c5bcded4d95cdeeb0 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 26 Jun 2014 10:06:45 -0700 Subject: [PATCH 057/455] net: systemport: fix UniMAC reset logic The UniMAC CMD_SW_RESET bit is not a self-clearing bit, so we need to assert it, wait a bit and clear it manually. As a result, umac_reset() is updated not to return any value. The previous version of the code simply wrote 0 to the CMD register, which would make the busy-waiting loop exit immediately, having zero effect. By writing 0 to the CMD register, we were clearing all bits in the CMD register, and not using the hardware reset default values which are set on purpose. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bcmsysport.c | 33 ++++++---------------- 1 file changed, 9 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index f6bccd847ee6..d31f7d239064 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1254,28 +1254,17 @@ static inline void umac_enable_set(struct bcm_sysport_priv *priv, usleep_range(1000, 2000); } -static inline int umac_reset(struct bcm_sysport_priv *priv) +static inline void umac_reset(struct bcm_sysport_priv *priv) { - unsigned int timeout = 0; u32 reg; - int ret = 0; - umac_writel(priv, 0, UMAC_CMD); - while (timeout++ < 1000) { - reg = umac_readl(priv, UMAC_CMD); - if (!(reg & CMD_SW_RESET)) - break; - - udelay(1); - } - - if (timeout == 1000) { - dev_err(&priv->pdev->dev, - "timeout waiting for MAC to come out of reset\n"); - ret = -ETIMEDOUT; - } - - return ret; + reg = umac_readl(priv, UMAC_CMD); + reg |= CMD_SW_RESET; + umac_writel(priv, reg, UMAC_CMD); + udelay(10); + reg = umac_readl(priv, UMAC_CMD); + reg &= ~CMD_SW_RESET; + umac_writel(priv, reg, UMAC_CMD); } static void umac_set_hw_addr(struct bcm_sysport_priv *priv, @@ -1303,11 +1292,7 @@ static int bcm_sysport_open(struct net_device *dev) int ret; /* Reset UniMAC */ - ret = umac_reset(priv); - if (ret) { - netdev_err(dev, "UniMAC reset failed\n"); - return ret; - } + umac_reset(priv); /* Flush TX and RX FIFOs at TOPCTRL level */ topctrl_flush(priv); From 16f62d9bed0cdee5f9341b64cc7144869282122d Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 26 Jun 2014 10:06:46 -0700 Subject: [PATCH 058/455] net: systemport: fix TX NAPI work done return value Although we do not limit the number of packets the TX completion function bcm_sysport_tx_reclaim() is allowed to reclaim, we were still using its return value as-is. This means that we could hit the WARN() in net/core/dev.c where work_done >= budget. Make sure we do exit the NAPI context when the TX ring is empty, and pretend there was no work to do. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bcmsysport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index d31f7d239064..5776e503e4c5 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -654,13 +654,13 @@ static int bcm_sysport_tx_poll(struct napi_struct *napi, int budget) work_done = bcm_sysport_tx_reclaim(ring->priv, ring); - if (work_done < budget) { + if (work_done == 0) { napi_complete(napi); /* re-enable TX interrupt */ intrl2_1_mask_clear(ring->priv, BIT(ring->index)); } - return work_done; + return 0; } static void bcm_sysport_tx_reclaim_all(struct bcm_sysport_priv *priv) From 0f50ce96b7c0488cffe64255694a2451b4347d09 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 26 Jun 2014 10:26:20 -0700 Subject: [PATCH 059/455] net: bcmgenet: disable clock before register_netdev As soon as register_netdev() is called, the network device notifiers are running which means that other parts of the kernel, or user-space programs can call the network device ndo_open() callback and use the interface. Disable the Ethernet device clock before we register the network device such that we do not create the following situation: CPU0 CPU1 register_netdev() bcmgenet_open() clk_prepare_enable() clk_disable_unprepare() and leave the hardware block gated off, while we think it should be gated on. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index 5ba1cfbd60da..d17953cc5f49 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -2535,14 +2535,14 @@ static int bcmgenet_probe(struct platform_device *pdev) netif_set_real_num_tx_queues(priv->dev, priv->hw_params->tx_queues + 1); netif_set_real_num_rx_queues(priv->dev, priv->hw_params->rx_queues + 1); - err = register_netdev(dev); - if (err) - goto err_clk_disable; - /* Turn off the main clock, WOL clock is handled separately */ if (!IS_ERR(priv->clk)) clk_disable_unprepare(priv->clk); + err = register_netdev(dev); + if (err) + goto err; + return err; err_clk_disable: From 219575eb6311ad090e26c675a6e217f5d48780a3 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 26 Jun 2014 10:26:21 -0700 Subject: [PATCH 060/455] net: bcmgenet: start with carrier off We use the PHY library which will determine the link state for us, make sure we start with a carrier off until libphy has completed the link training. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index d17953cc5f49..e51e462bc8fd 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -2535,6 +2535,9 @@ static int bcmgenet_probe(struct platform_device *pdev) netif_set_real_num_tx_queues(priv->dev, priv->hw_params->tx_queues + 1); netif_set_real_num_rx_queues(priv->dev, priv->hw_params->rx_queues + 1); + /* libphy will determine the link state */ + netif_carrier_off(dev); + /* Turn off the main clock, WOL clock is handled separately */ if (!IS_ERR(priv->clk)) clk_disable_unprepare(priv->clk); From b758858c5ceb1b30ae7d04dea6c74821bd7c7d69 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 26 Jun 2014 10:26:22 -0700 Subject: [PATCH 061/455] net: bcmgenet: do not set packet length for RX buffers Hardware will provide this information as soon as we will start processing incoming packets, so there is no need to set the RX buffer length during buffer allocation. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index e51e462bc8fd..16281ad2da12 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -1408,13 +1408,6 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv) if (cb->skb) continue; - /* set the DMA descriptor length once and for all - * it will only change if we support dynamically sizing - * priv->rx_buf_len, but we do not - */ - dmadesc_set_length_status(priv, priv->rx_bd_assign_ptr, - priv->rx_buf_len << DMA_BUFLENGTH_SHIFT); - ret = bcmgenet_rx_refill(priv, cb); if (ret) break; From 3896c329df8092661dac80f55a8c3f60136fd61a Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 24 Jun 2014 14:48:19 +0200 Subject: [PATCH 062/455] x86, tsc: Fix cpufreq lockup Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver locked up when doing cpu hotplug. Because we called set_cyc2ns_scale() from the time_cpufreq_notifier() unconditionally, it gets called multiple times for each freq change, instead of only the once, when the tsc_khz value actually changes. Because it gets called more than once, we run out of cyc2ns data slots and stall, waiting for a free one, but because we're half way offline, there's no consumers to free slots. By placing the call inside the condition that actually changes tsc_khz we avoid superfluous calls and avoid the problem. Reported-by: Mauro Tested-by: Mauro Fixes: 20d1c86a5776 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: Bin Gao Cc: Linus Torvalds Cc: Mika Westerberg Cc: Paul Gortmaker Cc: Stefani Seibold Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar --- arch/x86/kernel/tsc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 57e5ce126d5a..ea030319b321 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -920,9 +920,9 @@ static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val, tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) mark_tsc_unstable("cpufreq changes"); - } - set_cyc2ns_scale(tsc_khz, freq->cpu); + set_cyc2ns_scale(tsc_khz, freq->cpu); + } return 0; } From b6220ad66bcd4a50737eb3c08e9466aa44f3bc98 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Tue, 24 Jun 2014 18:05:29 -0700 Subject: [PATCH 063/455] sched: Fix compiler warnings Commit 143e1e28cb (sched: Rework sched_domain topology definition) introduced a number of functions with a return value of 'const int'. gcc doesn't know what to do with that and, if the kernel is compiled with W=1, complains with the following warnings whenever sched.h is included. include/linux/sched.h:875:25: warning: type qualifiers ignored on function return type include/linux/sched.h:882:25: warning: type qualifiers ignored on function return type include/linux/sched.h:889:25: warning: type qualifiers ignored on function return type include/linux/sched.h:1002:21: warning: type qualifiers ignored on function return type Commits fb2aa855 (sched, ARM: Create a dedicated scheduler topology table) and 607b45e9a (sched, powerpc: Create a dedicated topology table) introduce the same warning in the arm and powerpc code. Drop 'const' from the function declarations to fix the problem. The fix for all three patches has to be applied together to avoid compilation failures for the affected architectures. Acked-by: Vincent Guittot Acked-by: Benjamin Herrenschmidt Signed-off-by: Guenter Roeck Cc: Russell King Cc: Paul Mackerras Cc: Dietmar Eggemann Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1403658329-13196-1-git-send-email-linux@roeck-us.net Signed-off-by: Ingo Molnar --- arch/arm/kernel/topology.c | 2 +- arch/powerpc/kernel/smp.c | 2 +- include/linux/sched.h | 8 ++++---- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 9d853189028b..e35d880f9773 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -275,7 +275,7 @@ void store_cpu_topology(unsigned int cpuid) cpu_topology[cpuid].socket_id, mpidr); } -static inline const int cpu_corepower_flags(void) +static inline int cpu_corepower_flags(void) { return SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN; } diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 51a3ff78838a..1007fb802e6b 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -747,7 +747,7 @@ int setup_profiling_timer(unsigned int multiplier) #ifdef CONFIG_SCHED_SMT /* cpumask of CPUs with asymetric SMT dependancy */ -static const int powerpc_smt_flags(void) +static int powerpc_smt_flags(void) { int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES; diff --git a/include/linux/sched.h b/include/linux/sched.h index 306f4f0c987a..0376b054a0d0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -872,21 +872,21 @@ enum cpu_idle_type { #define SD_NUMA 0x4000 /* cross-node balancing */ #ifdef CONFIG_SCHED_SMT -static inline const int cpu_smt_flags(void) +static inline int cpu_smt_flags(void) { return SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES; } #endif #ifdef CONFIG_SCHED_MC -static inline const int cpu_core_flags(void) +static inline int cpu_core_flags(void) { return SD_SHARE_PKG_RESOURCES; } #endif #ifdef CONFIG_NUMA -static inline const int cpu_numa_flags(void) +static inline int cpu_numa_flags(void) { return SD_NUMA; } @@ -999,7 +999,7 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms); bool cpus_share_cache(int this_cpu, int that_cpu); typedef const struct cpumask *(*sched_domain_mask_f)(int cpu); -typedef const int (*sched_domain_flags_f)(void); +typedef int (*sched_domain_flags_f)(void); #define SDTL_OVERLAP 0x01 From b292d7a10487aee6e74b1c18b8d95b92f40d4a4f Mon Sep 17 00:00:00 2001 From: HATAYAMA Daisuke Date: Wed, 25 Jun 2014 10:09:07 +0900 Subject: [PATCH 064/455] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, any NMI is falsely handled by a NMI handler of NMI watchdog if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. For example, we use external NMI to make system panic to get crash dump, but in this case, the external NMI is falsely handled do to the issue. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies an owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that the intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. As a result, if CondChgd bit is set, any NMI is falsely handled by the NMI handler of NMI watchdog. Also, if type of the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding action is never performed until CondChgd bit is cleared. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. Then the CondChgd bit is immediately cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0. On the other hand, on older processors such as Nehalem, Xeon E7540, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out that, the descriptions I found are: In 18.9.1: "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to indicate changes to the state of performancmonitoring hardware" In Table 35-2 IA-32 Architectural MSRs 63 CondChg: status bits of this register has changed. These are different from the bahviour I see on the actual system as I explained above. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. Signed-off-by: HATAYAMA Daisuke Acked-by: Don Zickus Signed-off-by: Peter Zijlstra Cc: Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event_intel.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index adb02aa62af5..07846d738bdb 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -1381,6 +1381,15 @@ again: intel_pmu_lbr_read(); + /* + * CondChgd bit 63 doesn't mean any overflow status. Ignore + * and clear the bit. + */ + if (__test_and_clear_bit(63, (unsigned long *)&status)) { + if (!status) + goto done; + } + /* * PEBS overflow sets bit 62 in the global status register */ From 1f9a7268c67f0290837aada443d28fd953ddca90 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 24 Jun 2014 10:20:25 +0200 Subject: [PATCH 065/455] perf: Do not allow optimized switch for non-cloned events The context check in perf_event_context_sched_out allows non-cloned context to be part of the optimized schedule out switch. This could move non-cloned context into another workload child. Once this child exits, the context is closed and leaves all original (parent) events in closed state. Any other new cloned event will have closed state and not measure anything. And probably causing other odd bugs. Signed-off-by: Jiri Olsa Signed-off-by: Peter Zijlstra Cc: Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Frederic Weisbecker Cc: Namhyung Kim Cc: Paul Mackerras Cc: Corey Ashford Cc: David Ahern Cc: Jiri Olsa Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1403598026-2310-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index a33d9a2bcbd7..b0c95f0f06fd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2320,7 +2320,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn, next_parent = rcu_dereference(next_ctx->parent_ctx); /* If neither context have a parent context; they cannot be clones. */ - if (!parent && !next_parent) + if (!parent || !next_parent) goto unlock; if (next_parent == ctx || next_ctx == parent || next_parent == parent) { From d9daa24720891a88bedb93928f57767da96e5c80 Mon Sep 17 00:00:00 2001 From: Daniel Mack Date: Sat, 28 Jun 2014 01:23:35 +0200 Subject: [PATCH 066/455] net: fix circular dependency in of_mdio code Commit 86f6cf4127 (net: of_mdio: add of_mdiobus_link_phydev()) introduced a circular dependency between libphy and of_mdio. depmod: ERROR: /kernel/drivers/net/phy/libphy.ko in dependency cycle! depmod: ERROR: /kernel/drivers/of/of_mdio.ko in dependency cycle! The problem is that of_mdio.c references &mdio_bus_type and libphy now references of_mdiobus_link_phydev. Fix this by not exporting of_mdiobus_link_phydev() from of_mdio.ko. Make it a static function in mdio_bus.c instead. Signed-off-by: Daniel Mack Reported-by: Jeff Mahoney Fixes: 86f6cf4127 (net: of_mdio: add of_mdiobus_link_phydev()) Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/mdio_bus.c | 44 ++++++++++++++++++++++++++++++++++++++ drivers/of/of_mdio.c | 34 ----------------------------- include/linux/of_mdio.h | 8 ------- 3 files changed, 44 insertions(+), 42 deletions(-) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 2e58aa54484c..4eaadcfcb0fe 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -187,6 +187,50 @@ struct mii_bus *of_mdio_find_bus(struct device_node *mdio_bus_np) return d ? to_mii_bus(d) : NULL; } EXPORT_SYMBOL(of_mdio_find_bus); + +/* Walk the list of subnodes of a mdio bus and look for a node that matches the + * phy's address with its 'reg' property. If found, set the of_node pointer for + * the phy. This allows auto-probed pyh devices to be supplied with information + * passed in via DT. + */ +static void of_mdiobus_link_phydev(struct mii_bus *mdio, + struct phy_device *phydev) +{ + struct device *dev = &phydev->dev; + struct device_node *child; + + if (dev->of_node || !mdio->dev.of_node) + return; + + for_each_available_child_of_node(mdio->dev.of_node, child) { + int addr; + int ret; + + ret = of_property_read_u32(child, "reg", &addr); + if (ret < 0) { + dev_err(dev, "%s has invalid PHY address\n", + child->full_name); + continue; + } + + /* A PHY must have a reg property in the range [0-31] */ + if (addr >= PHY_MAX_ADDR) { + dev_err(dev, "%s PHY address %i is too large\n", + child->full_name, addr); + continue; + } + + if (addr == phydev->addr) { + dev->of_node = child; + return; + } + } +} +#else /* !IS_ENABLED(CONFIG_OF_MDIO) */ +static inline void of_mdiobus_link_phydev(struct mii_bus *mdio, + struct phy_device *phydev) +{ +} #endif /** diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c index a3bf2122a8d5..401b2453da45 100644 --- a/drivers/of/of_mdio.c +++ b/drivers/of/of_mdio.c @@ -182,40 +182,6 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np) } EXPORT_SYMBOL(of_mdiobus_register); -/** - * of_mdiobus_link_phydev - Find a device node for a phy - * @mdio: pointer to mii_bus structure - * @phydev: phydev for which the of_node pointer should be set - * - * Walk the list of subnodes of a mdio bus and look for a node that matches the - * phy's address with its 'reg' property. If found, set the of_node pointer for - * the phy. This allows auto-probed pyh devices to be supplied with information - * passed in via DT. - */ -void of_mdiobus_link_phydev(struct mii_bus *mdio, - struct phy_device *phydev) -{ - struct device *dev = &phydev->dev; - struct device_node *child; - - if (dev->of_node || !mdio->dev.of_node) - return; - - for_each_available_child_of_node(mdio->dev.of_node, child) { - int addr; - - addr = of_mdio_parse_addr(&mdio->dev, child); - if (addr < 0) - continue; - - if (addr == phydev->addr) { - dev->of_node = child; - return; - } - } -} -EXPORT_SYMBOL(of_mdiobus_link_phydev); - /* Helper function for of_phy_find_device */ static int of_phy_match(struct device *dev, void *phy_np) { diff --git a/include/linux/of_mdio.h b/include/linux/of_mdio.h index a70c9493d55a..d449018d0726 100644 --- a/include/linux/of_mdio.h +++ b/include/linux/of_mdio.h @@ -25,9 +25,6 @@ struct phy_device *of_phy_attach(struct net_device *dev, extern struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np); -extern void of_mdiobus_link_phydev(struct mii_bus *mdio, - struct phy_device *phydev); - #else /* CONFIG_OF */ static inline int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np) { @@ -63,11 +60,6 @@ static inline struct mii_bus *of_mdio_find_bus(struct device_node *mdio_np) { return NULL; } - -static inline void of_mdiobus_link_phydev(struct mii_bus *mdio, - struct phy_device *phydev) -{ -} #endif /* CONFIG_OF */ #if defined(CONFIG_OF) && defined(CONFIG_FIXED_PHY) From a48e5fafecfb9c0c807d7e7284b5ff884dfb7a3a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Jul 2014 02:25:15 -0700 Subject: [PATCH 067/455] vlan: free percpu stats in device destructor Madalin-Cristian reported crashs happening after a recent commit (5a4ae5f6e7d4 "vlan: unnecessary to check if vlan_pcpu_stats is NULL") ----------------------------------------------------------------------- root@p5040ds:~# vconfig add eth8 1 root@p5040ds:~# vconfig rem eth8.1 Unable to handle kernel paging request for data at address 0x2bc88028 Faulting instruction address: 0xc058e950 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: CPU: 3 PID: 2167 Comm: vconfig Tainted: G W 3.16.0-rc3-00346-g65e85bf #2 task: e7264d90 ti: e2c2c000 task.ti: e2c2c000 NIP: c058e950 LR: c058ea30 CTR: c058e900 REGS: e2c2db20 TRAP: 0300 Tainted: G W (3.16.0-rc3-00346-g65e85bf) MSR: 00029002 CR: 48000428 XER: 20000000 DEAR: 2bc88028 ESR: 00000000 GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000 GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000 GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000 GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48 NIP [c058e950] vlan_dev_get_stats64+0x50/0x170 LR [c058ea30] vlan_dev_get_stats64+0x130/0x170 Call Trace: [e2c2dbd0] [ffffffea] 0xffffffea (unreliable) [e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140 [e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960 [e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110 [e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0 [e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50 [e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0 [e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160 [e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550 [e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0 [e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0 [e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80 [e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c Fix this problem by freeing percpu stats from dev->destructor() instead of ndo_uninit() Reported-by: Madalin-Cristian Bucur Signed-off-by: Eric Dumazet Tested-by: Madalin-Cristian Bucur Fixes: 5a4ae5f6e7d4 ("vlan: unnecessary to check if vlan_pcpu_stats is NULL") Cc: Li RongQing Signed-off-by: David S. Miller --- net/8021q/vlan_dev.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index ad2ac3c00398..dd11f612e03e 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -627,8 +627,6 @@ static void vlan_dev_uninit(struct net_device *dev) struct vlan_dev_priv *vlan = vlan_dev_priv(dev); int i; - free_percpu(vlan->vlan_pcpu_stats); - vlan->vlan_pcpu_stats = NULL; for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) { while ((pm = vlan->egress_priority_map[i]) != NULL) { vlan->egress_priority_map[i] = pm->next; @@ -785,6 +783,15 @@ static const struct net_device_ops vlan_netdev_ops = { .ndo_get_lock_subclass = vlan_dev_get_lock_subclass, }; +static void vlan_dev_free(struct net_device *dev) +{ + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + + free_percpu(vlan->vlan_pcpu_stats); + vlan->vlan_pcpu_stats = NULL; + free_netdev(dev); +} + void vlan_setup(struct net_device *dev) { ether_setup(dev); @@ -794,7 +801,7 @@ void vlan_setup(struct net_device *dev) dev->tx_queue_len = 0; dev->netdev_ops = &vlan_netdev_ops; - dev->destructor = free_netdev; + dev->destructor = vlan_dev_free; dev->ethtool_ops = &vlan_ethtool_ops; memset(dev->broadcast, 0, ETH_ALEN); From 5925a0555bdaf0b396a84318cbc21ba085f6c0d3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Jul 2014 02:39:38 -0700 Subject: [PATCH 068/455] net: fix sparse warning in sk_dst_set() sk_dst_cache has __rcu annotation, so we need a cast to avoid following sparse error : include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces) include/net/sock.h:1774:19: expected struct dst_entry [noderef] *__ret include/net/sock.h:1774:19: got struct dst_entry *dst Signed-off-by: Eric Dumazet Reported-by: kbuild test robot Fixes: 7f502361531e ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix") Signed-off-by: David S. Miller --- include/net/sock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index c556fd9b05ac..156350745700 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1771,7 +1771,7 @@ sk_dst_set(struct sock *sk, struct dst_entry *dst) struct dst_entry *old_dst; sk_tx_queue_clear(sk); - old_dst = xchg(&sk->sk_dst_cache, dst); + old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst); dst_release(old_dst); } From 5924f17a8a30c2ae18d034a86ee7581b34accef6 Mon Sep 17 00:00:00 2001 From: Christoph Paasch Date: Sat, 28 Jun 2014 18:26:37 +0200 Subject: [PATCH 069/455] tcp: Fix divide by zero when pushing during tcp-repair When in repair-mode and TCP_RECV_QUEUE is set, we end up calling tcp_push with mss_now being 0. If data is in the send-queue and tcp_set_skb_tso_segs gets called, we crash because it will divide by mss_now: [ 347.151939] divide error: 0000 [#1] SMP [ 347.152907] Modules linked in: [ 347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4 [ 347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000 [ 347.152907] EIP: 0060:[] EFLAGS: 00210246 CPU: 1 [ 347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0 [ 347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000 [ 347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00 [ 347.152907] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0 [ 347.152907] Stack: [ 347.152907] c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080 [ 347.152907] f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85 [ 347.152907] c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8 [ 347.152907] Call Trace: [ 347.152907] [] ? apic_timer_interrupt+0x2d/0x34 [ 347.152907] [] tcp_init_tso_segs+0x36/0x50 [ 347.152907] [] tcp_write_xmit+0x7a/0xbf0 [ 347.152907] [] ? up+0x28/0x40 [ 347.152907] [] ? console_unlock+0x295/0x480 [ 347.152907] [] ? vprintk_emit+0x1ef/0x4b0 [ 347.152907] [] __tcp_push_pending_frames+0x36/0xd0 [ 347.152907] [] tcp_push+0xf0/0x120 [ 347.152907] [] tcp_sendmsg+0xf1/0xbf0 [ 347.152907] [] ? kmem_cache_free+0xf0/0x120 [ 347.152907] [] ? __sigqueue_free+0x32/0x40 [ 347.152907] [] ? __sigqueue_free+0x32/0x40 [ 347.152907] [] ? do_wp_page+0x3e0/0x850 [ 347.152907] [] inet_sendmsg+0x4a/0xb0 [ 347.152907] [] ? handle_mm_fault+0x709/0xfb0 [ 347.152907] [] sock_aio_write+0xbb/0xd0 [ 347.152907] [] do_sync_write+0x69/0xa0 [ 347.152907] [] vfs_write+0x123/0x160 [ 347.152907] [] SyS_write+0x55/0xb0 [ 347.152907] [] sysenter_do_call+0x12/0x28 This can easily be reproduced with the following packetdrill-script (the "magic" with netem, sk_pacing and limit_output_bytes is done to prevent the kernel from pushing all segments, because hitting the limit without doing this is not so easy with packetdrill): 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 +0 > S. 0:0(0) ack 1 +0.1 < . 1:1(0) ack 1 win 65000 +0 accept(3, ..., ...) = 4 // This forces that not all segments of the snd-queue will be pushed +0 `tc qdisc add dev tun0 root netem delay 10ms` +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2` +0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0 +0 write(4,...,10000) = 10000 +0 write(4,...,10000) = 10000 // Set tcp-repair stuff, particularly TCP_RECV_QUEUE +0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0 +0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0 // This now will make the write push the remaining segments +0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0 +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000` // Now we will crash +0 write(4,...,1000) = 1000 This happens since ec3423257508 (tcp: fix retransmission in repair mode). Prior to that, the call to tcp_push was prevented by a check for tp->repair. The patch fixes it, by adding the new goto-label out_nopush. When exiting tcp_sendmsg and a push is not required, which is the case for tp->repair, we go to this label. When repairing and calling send() with TCP_RECV_QUEUE, the data is actually put in the receive-queue. So, no push is required because no data has been added to the send-queue. Cc: Andrew Vagin Cc: Pavel Emelyanov Fixes: ec3423257508 (tcp: fix retransmission in repair mode) Signed-off-by: Christoph Paasch Acked-by: Andrew Vagin Acked-by: Pavel Emelyanov Signed-off-by: David S. Miller --- net/ipv4/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index eb1dde37e678..9d2118e5fbc7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1108,7 +1108,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, if (unlikely(tp->repair)) { if (tp->repair_queue == TCP_RECV_QUEUE) { copied = tcp_send_rcvq(sk, msg, size); - goto out; + goto out_nopush; } err = -EINVAL; @@ -1282,6 +1282,7 @@ wait_for_memory: out: if (copied) tcp_push(sk, flags, mss_now, tp->nonagle, size_goal); +out_nopush: release_sock(sk); return copied + copied_syn; From f46d53d0e9151ea9553361d2bf044ba555350e5f Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Sun, 29 Jun 2014 01:45:53 +0100 Subject: [PATCH 070/455] defxx: Remove an incorrectly inverted preprocessor conditional The RX handler of the driver has two paths switched between, depending on the size of the frame received, as determined by SKBUFF_RX_COPYBREAK. When a small frame is received, a new skb allocated has data space large enough to hold the incoming frame only, and data is copied there from the original skb whose buffer is returned to the DMA RX ring; in that case `rx_in_place' is 0. When a large frame is received, a new skb allocated has data space large enough to hold the largest frame possible, including the overhead for alignment, the receive status and padding, over 4.5kiB overall, and its buffer is placed on the DMA RX ring while the original buffer is passed up to the network stack avoiding the need to copy data; in that case `rx_in_place' is 1. However the latter scenario is only possible when dynamic buffers are used, as determined by DYNAMIC_BUFFERS, because otherwise the buffers used for the DMA RX ring are fixed at the time the interface is brought up. That leads to an observation that the preprocessor conditional around the `rx_in_place' check is inverted, the check only really matters when dynamic buffers are in use. It has gone unnoticed for many years since support for using dynamic buffers on the DMA RX ring was introduced in 2.1.40 -- because the only problem that results is in the case where `rx_in_place' is 1 frame data received is unnecessarily copied to the newly-allocated buffer, before the buffer placed on the the DMA receive RX and its contents ignored. Therefore the only symptom is some performance loss. Rather than flipping the condition though I decided to discard the conditional altogether -- in the case of static buffers `rx_in_place' is always 0 so GCC will optimise the C conditional away instead. Tested on a few DEFPA and DEFTA boards successfully using both small and large frames, both with DYNAMIC_BUFFERS defined and with the macro undefined. Signed-off-by: Maciej W. Rozycki Signed-off-by: David S. Miller --- drivers/net/fddi/defxx.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/fddi/defxx.c b/drivers/net/fddi/defxx.c index eb78203cd58e..052927be074b 100644 --- a/drivers/net/fddi/defxx.c +++ b/drivers/net/fddi/defxx.c @@ -3074,10 +3074,7 @@ static void dfx_rcv_queue_process( break; } else { -#ifndef DYNAMIC_BUFFERS - if (! rx_in_place) -#endif - { + if (!rx_in_place) { /* Receive buffer allocated, pass receive packet up */ skb_copy_to_linear_data(skb, From 1b037474d0c0b5ceb65bc809e3d8ac4497ee041b Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Sun, 29 Jun 2014 02:09:19 +0100 Subject: [PATCH 071/455] defxx: Fix !DYNAMIC_BUFFERS compilation warnings This fixes compilation warnings: drivers/net/fddi/defxx.c:294: warning: 'dfx_rcv_flush' declared inline after being called drivers/net/fddi/defxx.c:294: warning: previous declaration of 'dfx_rcv_flush' was here drivers/net/fddi/defxx.c:2854: warning: 'my_skb_align' defined but not used triggered when the driver is built with DYNAMIC_BUFFERS undefined. Code tested to work just fine with these changes and a few DEFPA and DEFTA boards. Signed-off-by: Maciej W. Rozycki Signed-off-by: David S. Miller --- drivers/net/fddi/defxx.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/fddi/defxx.c b/drivers/net/fddi/defxx.c index 052927be074b..2aa57270838f 100644 --- a/drivers/net/fddi/defxx.c +++ b/drivers/net/fddi/defxx.c @@ -291,7 +291,11 @@ static int dfx_hw_dma_uninit(DFX_board_t *bp, PI_UINT32 type); static int dfx_rcv_init(DFX_board_t *bp, int get_buffers); static void dfx_rcv_queue_process(DFX_board_t *bp); +#ifdef DYNAMIC_BUFFERS static void dfx_rcv_flush(DFX_board_t *bp); +#else +static inline void dfx_rcv_flush(DFX_board_t *bp) {} +#endif static netdev_tx_t dfx_xmt_queue_pkt(struct sk_buff *skb, struct net_device *dev); @@ -2849,7 +2853,7 @@ static int dfx_hw_dma_uninit(DFX_board_t *bp, PI_UINT32 type) * Align an sk_buff to a boundary power of 2 * */ - +#ifdef DYNAMIC_BUFFERS static void my_skb_align(struct sk_buff *skb, int n) { unsigned long x = (unsigned long)skb->data; @@ -2859,7 +2863,7 @@ static void my_skb_align(struct sk_buff *skb, int n) skb_reserve(skb, v - x); } - +#endif /* * ================ @@ -3450,10 +3454,6 @@ static void dfx_rcv_flush( DFX_board_t *bp ) } } -#else -static inline void dfx_rcv_flush( DFX_board_t *bp ) -{ -} #endif /* DYNAMIC_BUFFERS */ /* From 35f6f45368632f21bd27559c44dbb1cab51d8947 Mon Sep 17 00:00:00 2001 From: Amir Vadai Date: Sun, 29 Jun 2014 11:54:55 +0300 Subject: [PATCH 072/455] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner CC: Ben Hutchings Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/cq.c | 2 - drivers/net/ethernet/mellanox/mlx4/en_cq.c | 4 ++ drivers/net/ethernet/mellanox/mlx4/en_rx.c | 16 ++++- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 6 -- drivers/net/ethernet/mellanox/mlx4/eq.c | 69 +++----------------- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 + include/linux/mlx4/device.h | 4 +- 7 files changed, 28 insertions(+), 74 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 80f725228f5b..56022d647837 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -294,8 +294,6 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, init_completion(&cq->free); cq->irq = priv->eq_table.eq[cq->vector].irq; - cq->irq_affinity_change = false; - return 0; err_radix: diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index 4b2130760eed..1213cc71348c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -128,6 +128,10 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, mlx4_warn(mdev, "Failed assigning an EQ to %s, falling back to legacy EQ's\n", name); } + + cq->irq_desc = + irq_to_desc(mlx4_eq_get_irq(mdev->dev, + cq->vector)); } } else { cq->vector = (cq->ring + 1 + priv->port) % diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index d2d415732d99..96724170308a 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -40,6 +40,7 @@ #include #include #include +#include #include "mlx4_en.h" @@ -896,16 +897,25 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget) /* If we used up all the quota - we're probably not done yet... */ if (done == budget) { + int cpu_curr; + const struct cpumask *aff; + INC_PERF_COUNTER(priv->pstats.napi_quota); - if (unlikely(cq->mcq.irq_affinity_change)) { - cq->mcq.irq_affinity_change = false; + + cpu_curr = smp_processor_id(); + aff = irq_desc_get_irq_data(cq->irq_desc)->affinity; + + if (unlikely(!cpumask_test_cpu(cpu_curr, aff))) { + /* Current cpu is not according to smp_irq_affinity - + * probably affinity changed. need to stop this NAPI + * poll, and restart it on the right CPU + */ napi_complete(napi); mlx4_en_arm_cq(priv, cq); return 0; } } else { /* Done for now */ - cq->mcq.irq_affinity_change = false; napi_complete(napi); mlx4_en_arm_cq(priv, cq); } diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 8be7483f8236..ac3dead3792c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -474,15 +474,9 @@ int mlx4_en_poll_tx_cq(struct napi_struct *napi, int budget) /* If we used up all the quota - we're probably not done yet... */ if (done < budget) { /* Done for now */ - cq->mcq.irq_affinity_change = false; napi_complete(napi); mlx4_en_arm_cq(priv, cq); return done; - } else if (unlikely(cq->mcq.irq_affinity_change)) { - cq->mcq.irq_affinity_change = false; - napi_complete(napi); - mlx4_en_arm_cq(priv, cq); - return 0; } return budget; } diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c index d954ec1eac17..2a004b347e1d 100644 --- a/drivers/net/ethernet/mellanox/mlx4/eq.c +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c @@ -53,11 +53,6 @@ enum { MLX4_EQ_ENTRY_SIZE = 0x20 }; -struct mlx4_irq_notify { - void *arg; - struct irq_affinity_notify notify; -}; - #define MLX4_EQ_STATUS_OK ( 0 << 28) #define MLX4_EQ_STATUS_WRITE_FAIL (10 << 28) #define MLX4_EQ_OWNER_SW ( 0 << 24) @@ -1088,57 +1083,6 @@ static void mlx4_unmap_clr_int(struct mlx4_dev *dev) iounmap(priv->clr_base); } -static void mlx4_irq_notifier_notify(struct irq_affinity_notify *notify, - const cpumask_t *mask) -{ - struct mlx4_irq_notify *n = container_of(notify, - struct mlx4_irq_notify, - notify); - struct mlx4_priv *priv = (struct mlx4_priv *)n->arg; - struct radix_tree_iter iter; - void **slot; - - radix_tree_for_each_slot(slot, &priv->cq_table.tree, &iter, 0) { - struct mlx4_cq *cq = (struct mlx4_cq *)(*slot); - - if (cq->irq == notify->irq) - cq->irq_affinity_change = true; - } -} - -static void mlx4_release_irq_notifier(struct kref *ref) -{ - struct mlx4_irq_notify *n = container_of(ref, struct mlx4_irq_notify, - notify.kref); - kfree(n); -} - -static void mlx4_assign_irq_notifier(struct mlx4_priv *priv, - struct mlx4_dev *dev, int irq) -{ - struct mlx4_irq_notify *irq_notifier = NULL; - int err = 0; - - irq_notifier = kzalloc(sizeof(*irq_notifier), GFP_KERNEL); - if (!irq_notifier) { - mlx4_warn(dev, "Failed to allocate irq notifier. irq %d\n", - irq); - return; - } - - irq_notifier->notify.irq = irq; - irq_notifier->notify.notify = mlx4_irq_notifier_notify; - irq_notifier->notify.release = mlx4_release_irq_notifier; - irq_notifier->arg = priv; - err = irq_set_affinity_notifier(irq, &irq_notifier->notify); - if (err) { - kfree(irq_notifier); - irq_notifier = NULL; - mlx4_warn(dev, "Failed to set irq notifier. irq %d\n", irq); - } -} - - int mlx4_alloc_eq_table(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -1409,8 +1353,6 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap, continue; /*we dont want to break here*/ } - mlx4_assign_irq_notifier(priv, dev, - priv->eq_table.eq[vec].irq); eq_set_ci(&priv->eq_table.eq[vec], 1); } @@ -1427,6 +1369,14 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap, } EXPORT_SYMBOL(mlx4_assign_eq); +int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + return priv->eq_table.eq[vec].irq; +} +EXPORT_SYMBOL(mlx4_eq_get_irq); + void mlx4_release_eq(struct mlx4_dev *dev, int vec) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -1438,9 +1388,6 @@ void mlx4_release_eq(struct mlx4_dev *dev, int vec) Belonging to a legacy EQ*/ mutex_lock(&priv->msix_ctl.pool_lock); if (priv->msix_ctl.pool_bm & 1ULL << i) { - irq_set_affinity_notifier( - priv->eq_table.eq[vec].irq, - NULL); free_irq(priv->eq_table.eq[vec].irq, &priv->eq_table.eq[vec]); priv->msix_ctl.pool_bm &= ~(1ULL << i); diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 0e15295bedd6..624e1939e9ee 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -343,6 +343,7 @@ struct mlx4_en_cq { #define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD) spinlock_t poll_lock; /* protects from LLS/napi conflicts */ #endif /* CONFIG_NET_RX_BUSY_POLL */ + struct irq_desc *irq_desc; }; struct mlx4_en_port_profile { diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index b12f4bbd064c..35b51e7af886 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -578,8 +578,6 @@ struct mlx4_cq { u32 cons_index; u16 irq; - bool irq_affinity_change; - __be32 *set_ci_db; __be32 *arm_db; int arm_sn; @@ -1167,6 +1165,8 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap, int *vector); void mlx4_release_eq(struct mlx4_dev *dev, int vec); +int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec); + int mlx4_get_phys_port_id(struct mlx4_dev *dev); int mlx4_wol_read(struct mlx4_dev *dev, u64 *config, int port); int mlx4_wol_write(struct mlx4_dev *dev, u64 config, int port); From 143b5ba21b2bd5091cd8dcd92de7ba1ed1d1c83c Mon Sep 17 00:00:00 2001 From: Amir Vadai Date: Sun, 29 Jun 2014 11:54:56 +0300 Subject: [PATCH 073/455] lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined When device is non numa aware (numa_node == -1), use all online cpu's. Signed-off-by: Amir Vadai Signed-off-by: David S. Miller --- lib/cpumask.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/cpumask.c b/lib/cpumask.c index c101230658eb..b6513a9f2892 100644 --- a/lib/cpumask.c +++ b/lib/cpumask.c @@ -191,7 +191,7 @@ int cpumask_set_cpu_local_first(int i, int numa_node, cpumask_t *dstp) i %= num_online_cpus(); - if (!cpumask_of_node(numa_node)) { + if (numa_node == -1 || !cpumask_of_node(numa_node)) { /* Use all online cpu's for non numa aware system */ cpumask_copy(mask, cpu_online_mask); } else { From bb273617a65b9ed75f8cf9417206cbfcfb41fc48 Mon Sep 17 00:00:00 2001 From: Amir Vadai Date: Sun, 29 Jun 2014 11:54:57 +0300 Subject: [PATCH 074/455] net/mlx4_en: IRQ affinity hint is not cleared on port down Need to remove affinity hint at mlx4_en_deactivate_cq() and not at mlx4_en_destroy_cq() - since affinity_mask might be free'd while still being used by procfs. Signed-off-by: Amir Vadai Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index 1213cc71348c..14c00048bbec 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -191,8 +191,6 @@ void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq) mlx4_en_unmap_buffer(&cq->wqres.buf); mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size); if (priv->mdev->dev->caps.comp_pool && cq->vector) { - if (!cq->is_tx) - irq_set_affinity_hint(cq->mcq.irq, NULL); mlx4_release_eq(priv->mdev->dev, cq->vector); } cq->vector = 0; @@ -208,6 +206,7 @@ void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) if (!cq->is_tx) { napi_hash_del(&cq->napi); synchronize_rcu(); + irq_set_affinity_hint(cq->mcq.irq, NULL); } netif_napi_del(&cq->napi); From 48bc03433cfdcac7a3bbb233d5c9f95297a0f5ab Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Sun, 29 Jun 2014 13:10:18 +0200 Subject: [PATCH 075/455] ieee802154: reassembly: fix possible buffer overflow The max_dsize attribute in ctl_table for lowpan_frags_ns_ctl_table is configured with integer accessing methods. This patch change the max_dsize attribute to int to avoid a possible buffer overflow. Signed-off-by: Alexander Aring Signed-off-by: David S. Miller --- include/net/netns/ieee802154_6lowpan.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/netns/ieee802154_6lowpan.h b/include/net/netns/ieee802154_6lowpan.h index 079030c853d8..e2070960bac0 100644 --- a/include/net/netns/ieee802154_6lowpan.h +++ b/include/net/netns/ieee802154_6lowpan.h @@ -16,7 +16,7 @@ struct netns_sysctl_lowpan { struct netns_ieee802154_lowpan { struct netns_sysctl_lowpan sysctl; struct netns_frags frags; - u16 max_dsize; + int max_dsize; }; #endif From 0acf16768740776feffac506ce93b1c06c059ac6 Mon Sep 17 00:00:00 2001 From: Vince Bridgers Date: Sun, 29 Jun 2014 20:34:51 -0500 Subject: [PATCH 076/455] net: stmmac: add platform init/exit for Altera's ARM socfpga This patch adds platform init/exit functions and modifications to support suspend/resume for the Altera Cyclone 5 SOC Ethernet controller. The platform exit function puts the controller into reset using the socfpga reset controller driver. The platform init function sets up the Synopsys mac by first making sure the Ethernet controller is held in reset, programming the phy mode through external support logic, then deasserts reset through the socfpga reset manager driver. Signed-off-by: Vince Bridgers Signed-off-by: David S. Miller --- .../ethernet/stmicro/stmmac/dwmac-socfpga.c | 69 +++++++++++++++++++ .../net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++ 2 files changed, 73 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c index fd8a217556a1..ec632e666c56 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c @@ -20,7 +20,9 @@ #include #include #include +#include #include +#include "stmmac.h" #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII 0x0 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII 0x1 @@ -34,6 +36,7 @@ struct socfpga_dwmac { u32 reg_shift; struct device *dev; struct regmap *sys_mgr_base_addr; + struct reset_control *stmmac_rst; }; static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *dev) @@ -43,6 +46,13 @@ static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device * u32 reg_offset, reg_shift; int ret; + dwmac->stmmac_rst = devm_reset_control_get(dev, + STMMAC_RESOURCE_NAME); + if (IS_ERR(dwmac->stmmac_rst)) { + dev_info(dev, "Could not get reset control!\n"); + return -EINVAL; + } + dwmac->interface = of_get_phy_mode(np); sys_mgr_base_addr = syscon_regmap_lookup_by_phandle(np, "altr,sysmgr-syscon"); @@ -125,6 +135,65 @@ static void *socfpga_dwmac_probe(struct platform_device *pdev) return dwmac; } +static void socfpga_dwmac_exit(struct platform_device *pdev, void *priv) +{ + struct socfpga_dwmac *dwmac = priv; + + /* On socfpga platform exit, assert and hold reset to the + * enet controller - the default state after a hard reset. + */ + if (dwmac->stmmac_rst) + reset_control_assert(dwmac->stmmac_rst); +} + +static int socfpga_dwmac_init(struct platform_device *pdev, void *priv) +{ + struct socfpga_dwmac *dwmac = priv; + struct net_device *ndev = platform_get_drvdata(pdev); + struct stmmac_priv *stpriv = NULL; + int ret = 0; + + if (ndev) + stpriv = netdev_priv(ndev); + + /* Assert reset to the enet controller before changing the phy mode */ + if (dwmac->stmmac_rst) + reset_control_assert(dwmac->stmmac_rst); + + /* Setup the phy mode in the system manager registers according to + * devicetree configuration + */ + ret = socfpga_dwmac_setup(dwmac); + + /* Deassert reset for the phy configuration to be sampled by + * the enet controller, and operation to start in requested mode + */ + if (dwmac->stmmac_rst) + reset_control_deassert(dwmac->stmmac_rst); + + /* Before the enet controller is suspended, the phy is suspended. + * This causes the phy clock to be gated. The enet controller is + * resumed before the phy, so the clock is still gated "off" when + * the enet controller is resumed. This code makes sure the phy + * is "resumed" before reinitializing the enet controller since + * the enet controller depends on an active phy clock to complete + * a DMA reset. A DMA reset will "time out" if executed + * with no phy clock input on the Synopsys enet controller. + * Verified through Synopsys Case #8000711656. + * + * Note that the phy clock is also gated when the phy is isolated. + * Phy "suspend" and "isolate" controls are located in phy basic + * control register 0, and can be modified by the phy driver + * framework. + */ + if (stpriv && stpriv->phydev) + phy_resume(stpriv->phydev); + + return ret; +} + const struct stmmac_of_data socfpga_gmac_data = { .setup = socfpga_dwmac_probe, + .init = socfpga_dwmac_init, + .exit = socfpga_dwmac_exit, }; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 057a1208e594..18315f34938b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2878,6 +2878,10 @@ int stmmac_suspend(struct net_device *ndev) clk_disable_unprepare(priv->stmmac_clk); } spin_unlock_irqrestore(&priv->lock, flags); + + priv->oldlink = 0; + priv->speed = 0; + priv->oldduplex = -1; return 0; } From 43d24e48940d04f587818fadc5305b109f5cb8cf Mon Sep 17 00:00:00 2001 From: Vince Bridgers Date: Sun, 29 Jun 2014 20:34:52 -0500 Subject: [PATCH 077/455] net: stmmac: Correct duplicate if/then/else case found by cppcheck Cppcheck found a duplicate if/then/else case where a receive descriptor was being processed. This patch corrects that issue. cppcheck --force --enable=all --inline-suppr . ... Checking enh_desc.c... [enh_desc.c:148] -> [enh_desc.c:144]: (style) Found duplicate if expressions. ... Signed-off-by: Vince Bridgers Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/enh_desc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c index 7e6628a91514..1e2bcf5f89e1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c +++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c @@ -145,7 +145,7 @@ static void enh_desc_get_ext_status(void *data, struct stmmac_extra_stats *x, x->rx_msg_type_delay_req++; else if (p->des4.erx.msg_type == RDES_EXT_DELAY_RESP) x->rx_msg_type_delay_resp++; - else if (p->des4.erx.msg_type == RDES_EXT_DELAY_REQ) + else if (p->des4.erx.msg_type == RDES_EXT_PDELAY_REQ) x->rx_msg_type_pdelay_req++; else if (p->des4.erx.msg_type == RDES_EXT_PDELAY_RESP) x->rx_msg_type_pdelay_resp++; From c8df8ce3ee5ff24993bba9033e7d13b16fa3809c Mon Sep 17 00:00:00 2001 From: Vince Bridgers Date: Sun, 29 Jun 2014 20:34:53 -0500 Subject: [PATCH 078/455] net: stmmac: Remove unneeded I/O read caught by cppcheck Cppcheck found a case where a local variable was being assigned a value, but not used. There seems to be no reason to read this register before assigning a new value, so addressing thie issue. cppcheck --force --enable=all --inline-suppr . shows ... Variable 'value' is reassigned a value before the old one has been used. Signed-off-by: Vince Bridgers Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index b3e148ef5683..9d3748361a1e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -320,11 +320,8 @@ static void dwmac1000_set_eee_timer(void __iomem *ioaddr, int ls, int tw) static void dwmac1000_ctrl_ane(void __iomem *ioaddr, bool restart) { - u32 value; - - value = readl(ioaddr + GMAC_AN_CTRL); /* auto negotiation enable and External Loopback enable */ - value = GMAC_AN_CTRL_ANE | GMAC_AN_CTRL_ELE; + u32 value = GMAC_AN_CTRL_ANE | GMAC_AN_CTRL_ELE; if (restart) value |= GMAC_AN_CTRL_RAN; From 1b6478231c6f5f844185acb32045cf195028cfce Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Wed, 2 Jul 2014 17:25:23 +0100 Subject: [PATCH 079/455] xen/manage: fix potential deadlock when resuming the console Calling xen_console_resume() in xen_suspend() causes a warning because it locks irq_mapping_update_lock (a mutex) and this may sleep. If a userspace process is using the evtchn device then this mutex may be locked at the point of the stop_machine() call and xen_console_resume() would then deadlock. Resuming the console after stop_machine() returns avoids this deadlock. Signed-off-by: David Vrabel Reviewed-by: Boris Ostrovsky Cc: --- drivers/xen/manage.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index c3667b202f2f..5f1e1f3cd186 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -88,7 +88,6 @@ static int xen_suspend(void *data) if (!si->cancelled) { xen_irq_resume(); - xen_console_resume(); xen_timer_resume(); } @@ -135,6 +134,10 @@ static void do_suspend(void) err = stop_machine(xen_suspend, &si, cpumask_of(0)); + /* Resume console as early as possible. */ + if (!si.cancelled) + xen_console_resume(); + raw_notifier_call_chain(&xen_resume_notifier, 0, NULL); dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE); From 43d826ca5979927131685cc2092c7ce862cb91cd Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Wed, 25 Jun 2014 09:12:30 +0300 Subject: [PATCH 080/455] iwlwifi: dvm: don't enable CTS to self We should always prefer to use full RTS protection. Using CTS to self gives a meaningless improvement, but this flow is much harder for the firmware which is likely to have issues with it. CC: Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/dvm/rxon.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/dvm/rxon.c b/drivers/net/wireless/iwlwifi/dvm/rxon.c index ed50de6362ed..6dc5dd3ced44 100644 --- a/drivers/net/wireless/iwlwifi/dvm/rxon.c +++ b/drivers/net/wireless/iwlwifi/dvm/rxon.c @@ -1068,13 +1068,6 @@ int iwlagn_commit_rxon(struct iwl_priv *priv, struct iwl_rxon_context *ctx) /* recalculate basic rates */ iwl_calc_basic_rates(priv, ctx); - /* - * force CTS-to-self frames protection if RTS-CTS is not preferred - * one aggregation protection method - */ - if (!priv->hw_params.use_rts_for_aggregation) - ctx->staging.flags |= RXON_FLG_SELF_CTS_EN; - if ((ctx->vif && ctx->vif->bss_conf.use_short_slot) || !(ctx->staging.flags & RXON_FLG_BAND_24G_MSK)) ctx->staging.flags |= RXON_FLG_SHORT_SLOT_MSK; @@ -1480,11 +1473,6 @@ void iwlagn_bss_info_changed(struct ieee80211_hw *hw, else ctx->staging.flags &= ~RXON_FLG_TGG_PROTECT_MSK; - if (bss_conf->use_cts_prot) - ctx->staging.flags |= RXON_FLG_SELF_CTS_EN; - else - ctx->staging.flags &= ~RXON_FLG_SELF_CTS_EN; - memcpy(ctx->staging.bssid_addr, bss_conf->bssid, ETH_ALEN); if (vif->type == NL80211_IFTYPE_AP || From dc271ee0d04d12d6bfabacbec803289a7072fbd9 Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Thu, 3 Jul 2014 20:46:35 +0300 Subject: [PATCH 081/455] iwlwifi: mvm: disable CTS to Self Firmware folks seem say that this flag can make trouble. Drop it. The advantage of CTS to self is that it slightly reduces the cost of the protection, but make the protection less reliable. Cc: [3.13+] Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c index 8b5302777632..725ba49576bf 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c @@ -667,10 +667,9 @@ static void iwl_mvm_mac_ctxt_cmd_common(struct iwl_mvm *mvm, if (vif->bss_conf.qos) cmd->qos_flags |= cpu_to_le32(MAC_QOS_FLG_UPDATE_EDCA); - if (vif->bss_conf.use_cts_prot) { + if (vif->bss_conf.use_cts_prot) cmd->protection_flags |= cpu_to_le32(MAC_PROT_FLG_TGG_PROTECT); - cmd->protection_flags |= cpu_to_le32(MAC_PROT_FLG_SELF_CTS_EN); - } + IWL_DEBUG_RATE(mvm, "use_cts_prot %d, ht_operation_mode %d\n", vif->bss_conf.use_cts_prot, vif->bss_conf.ht_operation_mode); From 701a9e619347ac06d59481db14bbc4df99763270 Mon Sep 17 00:00:00 2001 From: Amitkumar Karwar Date: Tue, 1 Jul 2014 14:39:04 -0700 Subject: [PATCH 082/455] mwifiex: initialize Tx/Rx info of a packet correctly There are few places at the begining of Tx/Rx paths where tx_info/rx_info is not correctly initialized. This patch takes care of it. Signed-off-by: Amitkumar Karwar Signed-off-by: Avinash Patil Signed-off-by: Bing Zhao Signed-off-by: John W. Linville --- drivers/net/wireless/mwifiex/11n_aggr.c | 1 + drivers/net/wireless/mwifiex/cfg80211.c | 1 + drivers/net/wireless/mwifiex/cmdevt.c | 1 + drivers/net/wireless/mwifiex/sta_tx.c | 1 + drivers/net/wireless/mwifiex/tdls.c | 2 ++ drivers/net/wireless/mwifiex/txrx.c | 1 + drivers/net/wireless/mwifiex/uap_txrx.c | 1 + 7 files changed, 8 insertions(+) diff --git a/drivers/net/wireless/mwifiex/11n_aggr.c b/drivers/net/wireless/mwifiex/11n_aggr.c index 5b32106182f8..fe0f66f73507 100644 --- a/drivers/net/wireless/mwifiex/11n_aggr.c +++ b/drivers/net/wireless/mwifiex/11n_aggr.c @@ -185,6 +185,7 @@ mwifiex_11n_aggregate_pkt(struct mwifiex_private *priv, skb_reserve(skb_aggr, headroom + sizeof(struct txpd)); tx_info_aggr = MWIFIEX_SKB_TXCB(skb_aggr); + memset(tx_info_aggr, 0, sizeof(*tx_info_aggr)); tx_info_aggr->bss_type = tx_info_src->bss_type; tx_info_aggr->bss_num = tx_info_src->bss_num; diff --git a/drivers/net/wireless/mwifiex/cfg80211.c b/drivers/net/wireless/mwifiex/cfg80211.c index e95dec91a561..b511613bba2d 100644 --- a/drivers/net/wireless/mwifiex/cfg80211.c +++ b/drivers/net/wireless/mwifiex/cfg80211.c @@ -220,6 +220,7 @@ mwifiex_cfg80211_mgmt_tx(struct wiphy *wiphy, struct wireless_dev *wdev, } tx_info = MWIFIEX_SKB_TXCB(skb); + memset(tx_info, 0, sizeof(*tx_info)); tx_info->bss_num = priv->bss_num; tx_info->bss_type = priv->bss_type; tx_info->pkt_len = pkt_len; diff --git a/drivers/net/wireless/mwifiex/cmdevt.c b/drivers/net/wireless/mwifiex/cmdevt.c index 8dee6c86f4f1..c161141f6c39 100644 --- a/drivers/net/wireless/mwifiex/cmdevt.c +++ b/drivers/net/wireless/mwifiex/cmdevt.c @@ -453,6 +453,7 @@ int mwifiex_process_event(struct mwifiex_adapter *adapter) if (skb) { rx_info = MWIFIEX_SKB_RXCB(skb); + memset(rx_info, 0, sizeof(*rx_info)); rx_info->bss_num = priv->bss_num; rx_info->bss_type = priv->bss_type; } diff --git a/drivers/net/wireless/mwifiex/sta_tx.c b/drivers/net/wireless/mwifiex/sta_tx.c index 5fce7e78a36e..70eb863c7249 100644 --- a/drivers/net/wireless/mwifiex/sta_tx.c +++ b/drivers/net/wireless/mwifiex/sta_tx.c @@ -150,6 +150,7 @@ int mwifiex_send_null_packet(struct mwifiex_private *priv, u8 flags) return -1; tx_info = MWIFIEX_SKB_TXCB(skb); + memset(tx_info, 0, sizeof(*tx_info)); tx_info->bss_num = priv->bss_num; tx_info->bss_type = priv->bss_type; tx_info->pkt_len = data_len - (sizeof(struct txpd) + INTF_HEADER_LEN); diff --git a/drivers/net/wireless/mwifiex/tdls.c b/drivers/net/wireless/mwifiex/tdls.c index e73034fbbde9..0e88364e0c67 100644 --- a/drivers/net/wireless/mwifiex/tdls.c +++ b/drivers/net/wireless/mwifiex/tdls.c @@ -605,6 +605,7 @@ int mwifiex_send_tdls_data_frame(struct mwifiex_private *priv, const u8 *peer, } tx_info = MWIFIEX_SKB_TXCB(skb); + memset(tx_info, 0, sizeof(*tx_info)); tx_info->bss_num = priv->bss_num; tx_info->bss_type = priv->bss_type; @@ -760,6 +761,7 @@ int mwifiex_send_tdls_action_frame(struct mwifiex_private *priv, const u8 *peer, skb->priority = MWIFIEX_PRIO_VI; tx_info = MWIFIEX_SKB_TXCB(skb); + memset(tx_info, 0, sizeof(*tx_info)); tx_info->bss_num = priv->bss_num; tx_info->bss_type = priv->bss_type; tx_info->flags |= MWIFIEX_BUF_FLAG_TDLS_PKT; diff --git a/drivers/net/wireless/mwifiex/txrx.c b/drivers/net/wireless/mwifiex/txrx.c index 37f26afd4314..fd7e5b9b4581 100644 --- a/drivers/net/wireless/mwifiex/txrx.c +++ b/drivers/net/wireless/mwifiex/txrx.c @@ -55,6 +55,7 @@ int mwifiex_handle_rx_packet(struct mwifiex_adapter *adapter, return -1; } + memset(rx_info, 0, sizeof(*rx_info)); rx_info->bss_num = priv->bss_num; rx_info->bss_type = priv->bss_type; diff --git a/drivers/net/wireless/mwifiex/uap_txrx.c b/drivers/net/wireless/mwifiex/uap_txrx.c index 9a56bc61cb1d..b0601b91cc4f 100644 --- a/drivers/net/wireless/mwifiex/uap_txrx.c +++ b/drivers/net/wireless/mwifiex/uap_txrx.c @@ -175,6 +175,7 @@ static void mwifiex_uap_queue_bridged_pkt(struct mwifiex_private *priv, } tx_info = MWIFIEX_SKB_TXCB(skb); + memset(tx_info, 0, sizeof(*tx_info)); tx_info->bss_num = priv->bss_num; tx_info->bss_type = priv->bss_type; tx_info->flags |= MWIFIEX_BUF_FLAG_BRIDGED_PKT; From fb9a0c443691ceaab3daba966bbbd9f5ff3aa26f Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Fri, 27 Jun 2014 10:42:03 +0100 Subject: [PATCH 083/455] xen/balloon: set ballooned out pages as invalid in p2m Since cd9151e26d31048b2b5e00fd02e110e07d2200c9 (xen/balloon: set a mapping for ballooned out pages), a ballooned out page had its entry in the p2m set to the MFN of one of the scratch pages. This means that the p2m will contain many entries pointing to the same MFN. During a domain save, these many-to-one entries are not identified as such and the scratch page is saved multiple times. On restore the ballooned pages are populated with new frames and the domain may use up its allocation before all pages can be restored. Since the original fix only needed to keep a mapping for the ballooned page it is safe to set ballooned out pages as INVALID_P2M_ENTRY in the p2m (as they were before). Thus preventing them from being saved and re-populated on restore. Signed-off-by: David Vrabel Reported-by: Marek Marczykowski Tested-by: Marek Marczykowski Acked-by: Stefano Stabellini Cc: --- drivers/xen/balloon.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index b7a506f2bb14..5c660c77f03b 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -426,20 +426,18 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) * p2m are consistent. */ if (!xen_feature(XENFEAT_auto_translated_physmap)) { - unsigned long p; - struct page *scratch_page = get_balloon_scratch_page(); - if (!PageHighMem(page)) { + struct page *scratch_page = get_balloon_scratch_page(); + ret = HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), pfn_pte(page_to_pfn(scratch_page), PAGE_KERNEL_RO), 0); BUG_ON(ret); - } - p = page_to_pfn(scratch_page); - __set_phys_to_machine(pfn, pfn_to_mfn(p)); - put_balloon_scratch_page(); + put_balloon_scratch_page(); + } + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); } #endif From 1cbbf90d0406913ad4b44194b07f4f41bde84e54 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Tue, 24 Jun 2014 10:03:59 -0300 Subject: [PATCH 084/455] [media] af9035: override tuner id when bad value set into eeprom Tuner ID set into EEPROM is wrong in some cases, which causes driver to select wrong tuner profile. That leads device non-working. Fix issue by overriding known bad tuner IDs with suitable default value. Thanks to MX-NET Telekomunikace s.r.o. for providing non-working DTV stick, that I could fix the bug! Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/usb/dvb-usb-v2/af9035.c | 42 ++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/drivers/media/usb/dvb-usb-v2/af9035.c b/drivers/media/usb/dvb-usb-v2/af9035.c index 021e4d35e4d7..7b9b75f60774 100644 --- a/drivers/media/usb/dvb-usb-v2/af9035.c +++ b/drivers/media/usb/dvb-usb-v2/af9035.c @@ -704,15 +704,41 @@ static int af9035_read_config(struct dvb_usb_device *d) if (ret < 0) goto err; - if (tmp == 0x00) - dev_dbg(&d->udev->dev, - "%s: [%d]tuner not set, using default\n", - __func__, i); - else - state->af9033_config[i].tuner = tmp; - dev_dbg(&d->udev->dev, "%s: [%d]tuner=%02x\n", - __func__, i, state->af9033_config[i].tuner); + __func__, i, tmp); + + /* tuner sanity check */ + if (state->chip_type == 0x9135) { + if (state->chip_version == 0x02) { + /* IT9135 BX (v2) */ + switch (tmp) { + case AF9033_TUNER_IT9135_60: + case AF9033_TUNER_IT9135_61: + case AF9033_TUNER_IT9135_62: + state->af9033_config[i].tuner = tmp; + break; + } + } else { + /* IT9135 AX (v1) */ + switch (tmp) { + case AF9033_TUNER_IT9135_38: + case AF9033_TUNER_IT9135_51: + case AF9033_TUNER_IT9135_52: + state->af9033_config[i].tuner = tmp; + break; + } + } + } else { + /* AF9035 */ + state->af9033_config[i].tuner = tmp; + } + + if (state->af9033_config[i].tuner != tmp) { + dev_info(&d->udev->dev, + "%s: [%d] overriding tuner from %02x to %02x\n", + KBUILD_MODNAME, i, tmp, + state->af9033_config[i].tuner); + } switch (state->af9033_config[i].tuner) { case AF9033_TUNER_TUA9001: From 9249196867803227dfb6b5f01f7683ce9b5572fd Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 12 Jun 2014 04:01:45 -0300 Subject: [PATCH 085/455] [media] davinci: vpif: missing unlocks on error We recently changed some locking around so we need some new unlocks on the error paths. Signed-off-by: Dan Carpenter Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- drivers/media/platform/davinci/vpif_capture.c | 1 + drivers/media/platform/davinci/vpif_display.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/media/platform/davinci/vpif_capture.c b/drivers/media/platform/davinci/vpif_capture.c index a7ed16497903..1e4ec697fb10 100644 --- a/drivers/media/platform/davinci/vpif_capture.c +++ b/drivers/media/platform/davinci/vpif_capture.c @@ -269,6 +269,7 @@ err: list_del(&buf->list); vb2_buffer_done(&buf->vb, VB2_BUF_STATE_QUEUED); } + spin_unlock_irqrestore(&common->irqlock, flags); return ret; } diff --git a/drivers/media/platform/davinci/vpif_display.c b/drivers/media/platform/davinci/vpif_display.c index 5bb085b19bcb..b431b58f39e3 100644 --- a/drivers/media/platform/davinci/vpif_display.c +++ b/drivers/media/platform/davinci/vpif_display.c @@ -233,6 +233,7 @@ err: list_del(&buf->list); vb2_buffer_done(&buf->vb, VB2_BUF_STATE_QUEUED); } + spin_unlock_irqrestore(&common->irqlock, flags); return ret; } From 3445857b22eafb70a6ac258979e955b116bfd2c6 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Mon, 16 Jun 2014 09:08:29 -0300 Subject: [PATCH 086/455] [media] hdpvr: fix two audio bugs When the audio encoding is changed the driver calls hdpvr_set_audio with the current opt->audio_input value. However, that should have been opt->audio_input + 1. So changing the audio encoding inadvertently changes the input as well. This bug has always been there. The second bug was introduced in kernel 3.10 and that broke the default_audio_input module option handling: the audio encoding was never switched to AC3 if default_audio_input was set to 2 (SPDIF input). In addition, since starting with 3.10 the audio encoding is always set at the start the first bug now always happens when the driver is loaded. In the past this bug would only surface if the user would change the audio encoding after the driver was loaded. Also fixes a small trivial typo (bufffer -> buffer). Signed-off-by: Hans Verkuil Reported-by: Scott Doty Cc: stable@vger.kernel.org # for v3.10 and up Signed-off-by: Mauro Carvalho Chehab --- drivers/media/usb/hdpvr/hdpvr-video.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c b/drivers/media/usb/hdpvr/hdpvr-video.c index 0500c4175d5f..6bce01a674f9 100644 --- a/drivers/media/usb/hdpvr/hdpvr-video.c +++ b/drivers/media/usb/hdpvr/hdpvr-video.c @@ -82,7 +82,7 @@ static void hdpvr_read_bulk_callback(struct urb *urb) } /*=========================================================================*/ -/* bufffer bits */ +/* buffer bits */ /* function expects dev->io_mutex to be hold by caller */ int hdpvr_cancel_queue(struct hdpvr_device *dev) @@ -926,7 +926,7 @@ static int hdpvr_s_ctrl(struct v4l2_ctrl *ctrl) case V4L2_CID_MPEG_AUDIO_ENCODING: if (dev->flags & HDPVR_FLAG_AC3_CAP) { opt->audio_codec = ctrl->val; - return hdpvr_set_audio(dev, opt->audio_input, + return hdpvr_set_audio(dev, opt->audio_input + 1, opt->audio_codec); } return 0; @@ -1198,7 +1198,7 @@ int hdpvr_register_videodev(struct hdpvr_device *dev, struct device *parent, v4l2_ctrl_new_std_menu(hdl, &hdpvr_ctrl_ops, V4L2_CID_MPEG_AUDIO_ENCODING, ac3 ? V4L2_MPEG_AUDIO_ENCODING_AC3 : V4L2_MPEG_AUDIO_ENCODING_AAC, - 0x7, V4L2_MPEG_AUDIO_ENCODING_AAC); + 0x7, ac3 ? dev->options.audio_codec : V4L2_MPEG_AUDIO_ENCODING_AAC); v4l2_ctrl_new_std_menu(hdl, &hdpvr_ctrl_ops, V4L2_CID_MPEG_VIDEO_ENCODING, V4L2_MPEG_VIDEO_ENCODING_MPEG_4_AVC, 0x3, From ef8290ac727b1718417a781412fb6c25e97b5389 Mon Sep 17 00:00:00 2001 From: Michael Welling Date: Tue, 10 Jun 2014 13:46:34 -0500 Subject: [PATCH 087/455] gpio: mcp23s08: Eliminates redundant checking. Unnecessary checking was added during the merge of the gpio branch. This patch removes the extra unnecessary checking. Signed-off-by: Michael Welling Signed-off-by: Linus Walleij --- drivers/gpio/gpio-mcp23s08.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/gpio/gpio-mcp23s08.c b/drivers/gpio/gpio-mcp23s08.c index fe7c0e211f9a..57adbc90fdad 100644 --- a/drivers/gpio/gpio-mcp23s08.c +++ b/drivers/gpio/gpio-mcp23s08.c @@ -900,8 +900,6 @@ static int mcp23s08_probe(struct spi_device *spi) if (spi_present_mask & (1 << addr)) chips++; } - if (!chips) - return -ENODEV; } else { type = spi_get_device_id(spi)->driver_data; pdata = dev_get_platdata(&spi->dev); @@ -940,10 +938,6 @@ static int mcp23s08_probe(struct spi_device *spi) if (!(spi_present_mask & (1 << addr))) continue; chips--; - if (chips < 0) { - dev_err(&spi->dev, "FATAL: invalid negative chip id\n"); - goto fail; - } data->mcp[addr] = &data->chip[chips]; status = mcp23s08_probe_one(data->mcp[addr], &spi->dev, spi, 0x40 | (addr << 1), type, base, From 6938ad40cb97a52d88a763008935340729a4acc7 Mon Sep 17 00:00:00 2001 From: Ted Juan Date: Fri, 20 Jun 2014 17:32:05 +0800 Subject: [PATCH 088/455] mtd: devices: elm: fix elm_context_save() and elm_context_restore() functions These two function's switch case lack the 'break' that make them always return error. Signed-off-by: Ted Juan Acked-by: Pekon Gupta Cc: # 3.12.x+ Signed-off-by: Brian Norris --- drivers/mtd/devices/elm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mtd/devices/elm.c b/drivers/mtd/devices/elm.c index 7df86948e6d4..b4f61c7fc161 100644 --- a/drivers/mtd/devices/elm.c +++ b/drivers/mtd/devices/elm.c @@ -475,6 +475,7 @@ static int elm_context_save(struct elm_info *info) ELM_SYNDROME_FRAGMENT_1 + offset); regs->elm_syndrome_fragment_0[i] = elm_read_reg(info, ELM_SYNDROME_FRAGMENT_0 + offset); + break; default: return -EINVAL; } @@ -520,6 +521,7 @@ static int elm_context_restore(struct elm_info *info) regs->elm_syndrome_fragment_1[i]); elm_write_reg(info, ELM_SYNDROME_FRAGMENT_0 + offset, regs->elm_syndrome_fragment_0[i]); + break; default: return -EINVAL; } From c6a21ff319947983446e99f90191401241ce9945 Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Sun, 6 Jul 2014 12:51:22 +0300 Subject: [PATCH 089/455] iwlwifi: mvm: fix merge damage Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/mvm/mac80211.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c index 9bfb90680cdc..98556d03c1ed 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c @@ -303,13 +303,6 @@ int iwl_mvm_mac_setup_register(struct iwl_mvm *mvm) hw->uapsd_max_sp_len = IWL_UAPSD_MAX_SP; } - if (mvm->fw->ucode_capa.flags & IWL_UCODE_TLV_FLAGS_UAPSD_SUPPORT && - !iwlwifi_mod_params.uapsd_disable) { - hw->flags |= IEEE80211_HW_SUPPORTS_UAPSD; - hw->uapsd_queues = IWL_UAPSD_AC_INFO; - hw->uapsd_max_sp_len = IWL_UAPSD_MAX_SP; - } - hw->sta_data_size = sizeof(struct iwl_mvm_sta); hw->vif_data_size = sizeof(struct iwl_mvm_vif); hw->chanctx_data_size = sizeof(u16); From a55c072dfe520f8fa03cf11b07b9268a8a17820a Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 13 Jun 2014 13:11:51 +0200 Subject: [PATCH 090/455] efi/arm64: efistub: remove local copy of linux_banner The shared efistub code for ARM and arm64 contains a local copy of linux_banner, allowing it to be referenced from separate executables such as the ARM decompressor. However, this introduces a dependency on generated header files, causing unnecessary rebuilds of the stub itself and, in case of arm64, vmlinux which contains it. On arm64, the copy is not actually needed since we can reference the original symbol directly, and as it turns out, there may be better ways to deal with this for ARM as well, so let's remove it from the shared code. If it still needs to be reintroduced for ARM later, it should live under arch/arm anyway and not in shared code. Signed-off-by: Ard Biesheuvel Acked-by: Will Deacon Signed-off-by: Matt Fleming --- arch/arm64/kernel/efi-stub.c | 2 -- drivers/firmware/efi/fdt.c | 10 ---------- 2 files changed, 12 deletions(-) diff --git a/arch/arm64/kernel/efi-stub.c b/arch/arm64/kernel/efi-stub.c index 60e98a639ac5..e786e6cdc400 100644 --- a/arch/arm64/kernel/efi-stub.c +++ b/arch/arm64/kernel/efi-stub.c @@ -12,8 +12,6 @@ #include #include #include -#include -#include /* * AArch64 requires the DTB to be 8-byte aligned in the first 512MiB from diff --git a/drivers/firmware/efi/fdt.c b/drivers/firmware/efi/fdt.c index 82d774161cc9..507a3df46a5d 100644 --- a/drivers/firmware/efi/fdt.c +++ b/drivers/firmware/efi/fdt.c @@ -23,16 +23,6 @@ static efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt, u32 fdt_val32; u64 fdt_val64; - /* - * Copy definition of linux_banner here. Since this code is - * built as part of the decompressor for ARM v7, pulling - * in version.c where linux_banner is defined for the - * kernel brings other kernel dependencies with it. - */ - const char linux_banner[] = - "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@" - LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n"; - /* Do some checks on provided FDT, if it exists*/ if (orig_fdt) { if (fdt_check_header(orig_fdt)) { From d033f48f3a4a9279c7475891fbb060d4881c22da Mon Sep 17 00:00:00 2001 From: Varun Sethi Date: Tue, 24 Jun 2014 19:27:15 +0530 Subject: [PATCH 091/455] iommu/fsl: Fix PAMU window size check. is_power_of_2 requires an unsigned long parameter which would lead to truncation of 64 bit values on 32 bit architectures. __ffs also expects an unsigned long parameter thus won't work for 64 bit values on 32 bit architectures. Signed-off-by: Varun Sethi Tested-by: Emil Medve Signed-off-by: Joerg Roedel --- drivers/iommu/fsl_pamu.c | 8 ++++---- drivers/iommu/fsl_pamu_domain.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index b99dd88e31b9..bb446d742a2d 100644 --- a/drivers/iommu/fsl_pamu.c +++ b/drivers/iommu/fsl_pamu.c @@ -170,10 +170,10 @@ int pamu_disable_liodn(int liodn) static unsigned int map_addrspace_size_to_wse(phys_addr_t addrspace_size) { /* Bug if not a power of 2 */ - BUG_ON(!is_power_of_2(addrspace_size)); + BUG_ON((addrspace_size & (addrspace_size - 1))); /* window size is 2^(WSE+1) bytes */ - return __ffs(addrspace_size) - 1; + return fls64(addrspace_size) - 2; } /* Derive the PAACE window count encoding for the subwindow count */ @@ -351,7 +351,7 @@ int pamu_config_ppaace(int liodn, phys_addr_t win_addr, phys_addr_t win_size, struct paace *ppaace; unsigned long fspi; - if (!is_power_of_2(win_size) || win_size < PAMU_PAGE_SIZE) { + if ((win_size & (win_size - 1)) || win_size < PAMU_PAGE_SIZE) { pr_debug("window size too small or not a power of two %llx\n", win_size); return -EINVAL; } @@ -464,7 +464,7 @@ int pamu_config_spaace(int liodn, u32 subwin_cnt, u32 subwin, return -ENOENT; } - if (!is_power_of_2(subwin_size) || subwin_size < PAMU_PAGE_SIZE) { + if ((subwin_size & (subwin_size - 1)) || subwin_size < PAMU_PAGE_SIZE) { pr_debug("subwindow size out of range, or not a power of 2\n"); return -EINVAL; } diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 93072ba44b1d..3dd0b8edd429 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -301,7 +301,7 @@ static int check_size(u64 size, dma_addr_t iova) * Size must be a power of two and at least be equal * to PAMU page size. */ - if (!is_power_of_2(size) || size < PAMU_PAGE_SIZE) { + if ((size & (size - 1)) || size < PAMU_PAGE_SIZE) { pr_debug("%s: size too small or not a power of two\n", __func__); return -EINVAL; } From 75f0e4615dc328e67634eccd86bc71597da9f8a3 Mon Sep 17 00:00:00 2001 From: Varun Sethi Date: Tue, 24 Jun 2014 19:27:16 +0530 Subject: [PATCH 092/455] iommu/fsl: Fix the device domain attach condition. Store the domain information for the device, only if it's not already attached to a domain. Signed-off-by: Varun Sethi Signed-off-by: Joerg Roedel --- drivers/iommu/fsl_pamu_domain.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 3dd0b8edd429..54060d16dec8 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -335,11 +335,6 @@ static struct fsl_dma_domain *iommu_alloc_dma_domain(void) return domain; } -static inline struct device_domain_info *find_domain(struct device *dev) -{ - return dev->archdata.iommu_domain; -} - static void remove_device_ref(struct device_domain_info *info, u32 win_cnt) { unsigned long flags; @@ -380,7 +375,7 @@ static void attach_device(struct fsl_dma_domain *dma_domain, int liodn, struct d * Check here if the device is already attached to domain or not. * If the device is already attached to a domain detach it. */ - old_domain_info = find_domain(dev); + old_domain_info = dev->archdata.iommu_domain; if (old_domain_info && old_domain_info->domain != dma_domain) { spin_unlock_irqrestore(&device_domain_lock, flags); detach_device(dev, old_domain_info->domain); @@ -399,7 +394,7 @@ static void attach_device(struct fsl_dma_domain *dma_domain, int liodn, struct d * the info for the first LIODN as all * LIODNs share the same domain */ - if (!old_domain_info) + if (!dev->archdata.iommu_domain) dev->archdata.iommu_domain = info; spin_unlock_irqrestore(&device_domain_lock, flags); From 3170447c1f264d51b8d1f3898bf2588588a64fdc Mon Sep 17 00:00:00 2001 From: Varun Sethi Date: Tue, 24 Jun 2014 19:27:17 +0530 Subject: [PATCH 093/455] iommu/fsl: Fix the error condition during iommu group Earlier PTR_ERR was being returned even if group was set to null. Now, we explicitly set an ERR_PTR value in case the group pointer is NULL. Signed-off-by: Varun Sethi Signed-off-by: Joerg Roedel --- drivers/iommu/fsl_pamu_domain.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 54060d16dec8..af47648301a9 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -1037,12 +1037,15 @@ root_bus: group = get_shared_pci_device_group(pdev); } + if (!group) + group = ERR_PTR(-ENODEV); + return group; } static int fsl_pamu_add_device(struct device *dev) { - struct iommu_group *group = NULL; + struct iommu_group *group = ERR_PTR(-ENODEV); struct pci_dev *pdev; const u32 *prop; int ret, len; @@ -1065,7 +1068,7 @@ static int fsl_pamu_add_device(struct device *dev) group = get_device_iommu_group(dev); } - if (!group || IS_ERR(group)) + if (IS_ERR(group)) return PTR_ERR(group); ret = iommu_group_add_device(group, dev); From 6ed179b67ca1a05034728ab160905900416b1835 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Thu, 12 Jun 2014 18:16:53 +1000 Subject: [PATCH 094/455] KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are assembly functions that are exported to modules and also require a valid r2. As such we need to use _GLOBAL_TOC so we provide a global entry point that establishes the TOC (r2). Signed-off-by: Anton Blanchard Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- arch/powerpc/kvm/book3s_rmhandlers.S | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 77356fd25ccc..8d9c5d21179d 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -48,7 +48,7 @@ * * LR = return address to continue at after eventually re-enabling MMU */ -_GLOBAL(kvmppc_hv_entry_trampoline) +_GLOBAL_TOC(kvmppc_hv_entry_trampoline) mflr r0 std r0, PPC_LR_STKOFF(r1) stdu r1, -112(r1) diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 9eec675220e6..4850a224e5bf 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -146,7 +146,7 @@ kvmppc_handler_skip_ins: * On entry, r4 contains the guest shadow MSR * MSR.EE has to be 0 when calling this function */ -_GLOBAL(kvmppc_entry_trampoline) +_GLOBAL_TOC(kvmppc_entry_trampoline) mfmsr r5 LOAD_REG_ADDR(r7, kvmppc_handler_trampoline_enter) toreal(r7) From 55ab169b7b9276e6e1e01a88531bcf34803dcde2 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Mon, 16 Jun 2014 14:37:53 +0200 Subject: [PATCH 095/455] KVM: PPC: Book3S PR: Fix ABIv2 on LE We switched to ABIv2 on Little Endian systems now which gets rid of the dotted function names. Branch to the actual functions when we see such a system. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_interrupts.S | 4 ++++ arch/powerpc/kvm/book3s_rmhandlers.S | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index e2c29e381dc7..d044b8b7c69d 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -25,7 +25,11 @@ #include #if defined(CONFIG_PPC_BOOK3S_64) +#if defined(_CALL_ELF) && _CALL_ELF == 2 +#define FUNC(name) name +#else #define FUNC(name) GLUE(.,name) +#endif #define GET_SHADOW_VCPU(reg) addi reg, r13, PACA_SVCPU #elif defined(CONFIG_PPC_BOOK3S_32) diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 4850a224e5bf..16c4d88ba27d 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -36,7 +36,11 @@ #if defined(CONFIG_PPC_BOOK3S_64) +#if defined(_CALL_ELF) && _CALL_ELF == 2 +#define FUNC(name) name +#else #define FUNC(name) GLUE(.,name) +#endif #elif defined(CONFIG_PPC_BOOK3S_32) From 08b9939997df30e42a228e1ecb97f99e9c8ea84e Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 7 Jul 2014 12:01:11 +0200 Subject: [PATCH 096/455] Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan" This reverts commit 277d916fc2e959c3f106904116bb4f7b1148d47a as it was at least breaking iwlwifi by setting the IEEE80211_TX_CTL_NO_PS_BUFFER flag in all kinds of interface modes, not only for AP mode where it is appropriate. To avoid reintroducing the original problem, explicitly check for probe request frames in the multicast buffering code. Cc: stable@vger.kernel.org Fixes: 277d916fc2e9 ("mac80211: move "bufferable MMPDU" check to fix AP mode scan") Signed-off-by: Johannes Berg --- net/mac80211/tx.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 5214686d9fd1..1a252c606ad0 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -414,6 +414,9 @@ ieee80211_tx_h_multicast_ps_buf(struct ieee80211_tx_data *tx) if (ieee80211_has_order(hdr->frame_control)) return TX_CONTINUE; + if (ieee80211_is_probe_req(hdr->frame_control)) + return TX_CONTINUE; + if (tx->local->hw.flags & IEEE80211_HW_QUEUE_CONTROL) info->hw_queue = tx->sdata->vif.cab_queue; @@ -463,6 +466,7 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) { struct sta_info *sta = tx->sta; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; struct ieee80211_local *local = tx->local; if (unlikely(!sta)) @@ -473,6 +477,12 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) !(info->flags & IEEE80211_TX_CTL_NO_PS_BUFFER))) { int ac = skb_get_queue_mapping(tx->skb); + if (ieee80211_is_mgmt(hdr->frame_control) && + !ieee80211_is_bufferable_mmpdu(hdr->frame_control)) { + info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; + return TX_CONTINUE; + } + ps_dbg(sta->sdata, "STA %pM aid %d: PS buffer for AC %d\n", sta->sta.addr, sta->sta.aid, ac); if (tx->local->total_ps_buffered >= TOTAL_MAX_TX_BUFFER) @@ -531,19 +541,9 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) static ieee80211_tx_result debug_noinline ieee80211_tx_h_ps_buf(struct ieee80211_tx_data *tx) { - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); - struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; - if (unlikely(tx->flags & IEEE80211_TX_PS_BUFFERED)) return TX_CONTINUE; - if (ieee80211_is_mgmt(hdr->frame_control) && - !ieee80211_is_bufferable_mmpdu(hdr->frame_control)) { - if (tx->flags & IEEE80211_TX_UNICAST) - info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; - return TX_CONTINUE; - } - if (tx->flags & IEEE80211_TX_UNICAST) return ieee80211_tx_h_unicast_ps_buf(tx); else From 525549d7342fca7fca9fc11298b5ab3617b6f730 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Mon, 7 Jul 2014 15:13:12 +0200 Subject: [PATCH 097/455] ALSA: hda: Fix build warning The hda_tegra_disable_clocks() function is only used by the suspend and resume code, so it needs to be included in the #ifdef CONFIG_PM_SLEEP block to prevent the following warning: CC sound/pci/hda/hda_tegra.o sound/pci/hda/hda_tegra.c:238:13: warning: 'hda_tegra_disable_clocks' defined but not used [-Wunused-function] static void hda_tegra_disable_clocks(struct hda_tegra *data) ^ Signed-off-by: Thierry Reding Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_tegra.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/pci/hda/hda_tegra.c b/sound/pci/hda/hda_tegra.c index a366ba9293a8..358414da6418 100644 --- a/sound/pci/hda/hda_tegra.c +++ b/sound/pci/hda/hda_tegra.c @@ -236,6 +236,7 @@ disable_hda: return rc; } +#ifdef CONFIG_PM_SLEEP static void hda_tegra_disable_clocks(struct hda_tegra *data) { clk_disable_unprepare(data->hda2hdmi_clk); @@ -243,7 +244,6 @@ static void hda_tegra_disable_clocks(struct hda_tegra *data) clk_disable_unprepare(data->hda_clk); } -#ifdef CONFIG_PM_SLEEP /* * power management */ From 126b9d4365b110c157bc4cbc32540dfa66c9c85a Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 7 Jul 2014 15:28:50 +0200 Subject: [PATCH 098/455] fuse: timeout comparison fix As suggested by checkpatch.pl, use time_before64() instead of direct comparison of jiffies64 values. Signed-off-by: Miklos Szeredi Cc: --- fs/fuse/dir.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 42198359fa1b..225176203a8c 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -198,7 +198,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) inode = ACCESS_ONCE(entry->d_inode); if (inode && is_bad_inode(inode)) goto invalid; - else if (fuse_dentry_time(entry) < get_jiffies_64()) { + else if (time_before64(fuse_dentry_time(entry), get_jiffies_64())) { int err; struct fuse_entry_out outarg; struct fuse_req *req; @@ -985,7 +985,7 @@ int fuse_update_attributes(struct inode *inode, struct kstat *stat, int err; bool r; - if (fi->i_time < get_jiffies_64()) { + if (time_before64(fi->i_time, get_jiffies_64())) { r = true; err = fuse_do_getattr(inode, stat, file); } else { @@ -1171,7 +1171,7 @@ static int fuse_permission(struct inode *inode, int mask) ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))) { struct fuse_inode *fi = get_fuse_inode(inode); - if (fi->i_time < get_jiffies_64()) { + if (time_before64(fi->i_time, get_jiffies_64())) { refreshed = true; err = fuse_perm_getattr(inode, mask); From 154210ccb3a871e631bf39fdeb7a8731d98af87b Mon Sep 17 00:00:00 2001 From: Anand Avati Date: Thu, 26 Jun 2014 20:21:57 -0400 Subject: [PATCH 099/455] fuse: ignore entry-timeout on LOOKUP_REVAL The following test case demonstrates the bug: sh# mount -t glusterfs localhost:meta-test /mnt/one sh# mount -t glusterfs localhost:meta-test /mnt/two sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file bash: /mnt/one/file: Stale file handle sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file On the second open() on /mnt/one, FUSE would have used the old nodeid (file handle) trying to re-open it. Gluster is returning -ESTALE. The ESTALE propagates back to namei.c:filename_lookup() where lookup is re-attempted with LOOKUP_REVAL. The right behavior now, would be for FUSE to ignore the entry-timeout and and do the up-call revalidation. Instead FUSE is ignoring LOOKUP_REVAL, succeeding the revalidation (because entry-timeout has not passed), and open() is again retried on the old file handle and finally the ESTALE is going back to the application. Fix: if revalidation is happening with LOOKUP_REVAL, then ignore entry-timeout and always do the up-call. Signed-off-by: Anand Avati Reviewed-by: Niels de Vos Signed-off-by: Miklos Szeredi Cc: stable@vger.kernel.org --- fs/fuse/dir.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 225176203a8c..202a9721be93 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -198,7 +198,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) inode = ACCESS_ONCE(entry->d_inode); if (inode && is_bad_inode(inode)) goto invalid; - else if (time_before64(fuse_dentry_time(entry), get_jiffies_64())) { + else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || + (flags & LOOKUP_REVAL)) { int err; struct fuse_entry_out outarg; struct fuse_req *req; From 7b3d8bf7718a8de8ac6a25ed7332c1c129839400 Mon Sep 17 00:00:00 2001 From: Himangi Saraogi Date: Thu, 26 Jun 2014 23:06:52 +0530 Subject: [PATCH 100/455] fuse: inode: drop cast This patch removes the cast on data of type void * as it is not needed. The following Coccinelle semantic patch was used for making the change: @r@ expression x; void* e; type T; identifier f; @@ ( *((T *)e) | ((T *)x)[...] | ((T *)x)->f | - (T *) e ) Signed-off-by: Himangi Saraogi Acked-by: Julia Lawall Signed-off-by: Miklos Szeredi --- fs/fuse/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 754dcf23de8a..7b8fc7d33def 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1006,7 +1006,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) sb->s_flags &= ~(MS_NOSEC | MS_I_VERSION); - if (!parse_fuse_opt((char *) data, &d, is_bdev)) + if (!parse_fuse_opt(data, &d, is_bdev)) goto err; if (is_bdev) { From 233a01fa9c4c7c41238537e8db8434667ff28a2f Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 7 Jul 2014 15:28:51 +0200 Subject: [PATCH 101/455] fuse: handle large user and group ID If the number in "user_id=N" or "group_id=N" mount options was larger than INT_MAX then fuse returned EINVAL. Fix this to handle all valid uid/gid values. Signed-off-by: Miklos Szeredi Cc: stable@vger.kernel.org --- fs/fuse/inode.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 7b8fc7d33def..8474028d7848 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -478,6 +478,17 @@ static const match_table_t tokens = { {OPT_ERR, NULL} }; +static int fuse_match_uint(substring_t *s, unsigned int *res) +{ + int err = -ENOMEM; + char *buf = match_strdup(s); + if (buf) { + err = kstrtouint(buf, 10, res); + kfree(buf); + } + return err; +} + static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) { char *p; @@ -488,6 +499,7 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) while ((p = strsep(&opt, ",")) != NULL) { int token; int value; + unsigned uv; substring_t args[MAX_OPT_ARGS]; if (!*p) continue; @@ -511,18 +523,18 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev) break; case OPT_USER_ID: - if (match_int(&args[0], &value)) + if (fuse_match_uint(&args[0], &uv)) return 0; - d->user_id = make_kuid(current_user_ns(), value); + d->user_id = make_kuid(current_user_ns(), uv); if (!uid_valid(d->user_id)) return 0; d->user_id_present = 1; break; case OPT_GROUP_ID: - if (match_int(&args[0], &value)) + if (fuse_match_uint(&args[0], &uv)) return 0; - d->group_id = make_kgid(current_user_ns(), value); + d->group_id = make_kgid(current_user_ns(), uv); if (!gid_valid(d->group_id)) return 0; d->group_id_present = 1; From c55a01d360afafcd52bc405c044a6ebf5de436d5 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 7 Jul 2014 15:28:51 +0200 Subject: [PATCH 102/455] fuse: avoid scheduling while atomic As reported by Richard Sharpe, an attempt to use fuse_notify_inval_entry() triggers complains about scheduling while atomic: BUG: scheduling while atomic: fuse.hf/13976/0x10000001 This happens because fuse_notify_inval_entry() attempts to allocate memory with GFP_KERNEL, holding "struct fuse_copy_state" mapped by kmap_atomic(). Introduced by commit 58bda1da4b3c "fuse/dev: use atomic maps" Fix by moving the map/unmap to just cover the actual memcpy operation. Original patch from Maxim Patlasov Reported-by: Richard Sharpe Signed-off-by: Miklos Szeredi Cc: # v3.15+ --- fs/fuse/dev.c | 51 +++++++++++++++++++++++---------------------------- 1 file changed, 23 insertions(+), 28 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index aac71ce373e4..75fa055012b2 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -643,9 +643,8 @@ struct fuse_copy_state { unsigned long seglen; unsigned long addr; struct page *pg; - void *mapaddr; - void *buf; unsigned len; + unsigned offset; unsigned move_pages:1; }; @@ -666,23 +665,17 @@ static void fuse_copy_finish(struct fuse_copy_state *cs) if (cs->currbuf) { struct pipe_buffer *buf = cs->currbuf; - if (!cs->write) { - kunmap_atomic(cs->mapaddr); - } else { - kunmap_atomic(cs->mapaddr); + if (cs->write) buf->len = PAGE_SIZE - cs->len; - } cs->currbuf = NULL; - cs->mapaddr = NULL; - } else if (cs->mapaddr) { - kunmap_atomic(cs->mapaddr); + } else if (cs->pg) { if (cs->write) { flush_dcache_page(cs->pg); set_page_dirty_lock(cs->pg); } put_page(cs->pg); - cs->mapaddr = NULL; } + cs->pg = NULL; } /* @@ -691,7 +684,7 @@ static void fuse_copy_finish(struct fuse_copy_state *cs) */ static int fuse_copy_fill(struct fuse_copy_state *cs) { - unsigned long offset; + struct page *page; int err; unlock_request(cs->fc, cs->req); @@ -706,14 +699,12 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) BUG_ON(!cs->nr_segs); cs->currbuf = buf; - cs->mapaddr = kmap_atomic(buf->page); + cs->pg = buf->page; + cs->offset = buf->offset; cs->len = buf->len; - cs->buf = cs->mapaddr + buf->offset; cs->pipebufs++; cs->nr_segs--; } else { - struct page *page; - if (cs->nr_segs == cs->pipe->buffers) return -EIO; @@ -726,8 +717,8 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) buf->len = 0; cs->currbuf = buf; - cs->mapaddr = kmap_atomic(page); - cs->buf = cs->mapaddr; + cs->pg = page; + cs->offset = 0; cs->len = PAGE_SIZE; cs->pipebufs++; cs->nr_segs++; @@ -740,14 +731,13 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) cs->iov++; cs->nr_segs--; } - err = get_user_pages_fast(cs->addr, 1, cs->write, &cs->pg); + err = get_user_pages_fast(cs->addr, 1, cs->write, &page); if (err < 0) return err; BUG_ON(err != 1); - offset = cs->addr % PAGE_SIZE; - cs->mapaddr = kmap_atomic(cs->pg); - cs->buf = cs->mapaddr + offset; - cs->len = min(PAGE_SIZE - offset, cs->seglen); + cs->pg = page; + cs->offset = cs->addr % PAGE_SIZE; + cs->len = min(PAGE_SIZE - cs->offset, cs->seglen); cs->seglen -= cs->len; cs->addr += cs->len; } @@ -760,15 +750,20 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size) { unsigned ncpy = min(*size, cs->len); if (val) { + void *pgaddr = kmap_atomic(cs->pg); + void *buf = pgaddr + cs->offset; + if (cs->write) - memcpy(cs->buf, *val, ncpy); + memcpy(buf, *val, ncpy); else - memcpy(*val, cs->buf, ncpy); + memcpy(*val, buf, ncpy); + + kunmap_atomic(pgaddr); *val += ncpy; } *size -= ncpy; cs->len -= ncpy; - cs->buf += ncpy; + cs->offset += ncpy; return ncpy; } @@ -874,8 +869,8 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) out_fallback_unlock: unlock_page(newpage); out_fallback: - cs->mapaddr = kmap_atomic(buf->page); - cs->buf = cs->mapaddr + buf->offset; + cs->pg = buf->page; + cs->offset = buf->offset; err = lock_request(cs->fc, cs->req); if (err) From 6c642e442e99af1ca026af55a16f23b5f8ee612a Mon Sep 17 00:00:00 2001 From: zhangdianfang Date: Fri, 30 May 2014 08:37:28 +0800 Subject: [PATCH 103/455] tools/liblockdep: Fix comparison of a boolean value with a value of 2 Comparison of a boolean value (!__init_state) with a value of 2 (done) as currently happens in the code is unlikely to succeed and causes repeated initialization of the pthread function pointers. Instead, remove boolean comparison so that we would initialize said function pointers only once. Ref: https://bugzilla.kernel.org/show_bug.cgi?id=76741 Cc: Jean Delvare Reported-by: David Binderman Signed-off-by: Dianfang Zhang Signed-off-by: Sasha Levin --- tools/lib/lockdep/preload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/lockdep/preload.c b/tools/lib/lockdep/preload.c index 23bd69cb5ade..b5e52af6ddf3 100644 --- a/tools/lib/lockdep/preload.c +++ b/tools/lib/lockdep/preload.c @@ -92,7 +92,7 @@ enum { none, prepare, done, } __init_state; static void init_preload(void); static void try_init_preload(void) { - if (!__init_state != done) + if (__init_state != done) init_preload(); } From 0c37c686b336aeead2f0b982f24eafaeb19435b1 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Mon, 7 Jul 2014 12:14:27 -0400 Subject: [PATCH 104/455] tools/liblockdep: Remove debug print left over from development Remove a debug print in init_preload() which was left over from development and isn't usefull at all currently. It was also causing false positive test results. Reported-by: S. Lockwood-Childs Signed-off-by: Sasha Levin --- tools/lib/lockdep/preload.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/lib/lockdep/preload.c b/tools/lib/lockdep/preload.c index b5e52af6ddf3..51ec2ec8371b 100644 --- a/tools/lib/lockdep/preload.c +++ b/tools/lib/lockdep/preload.c @@ -439,8 +439,6 @@ __attribute__((constructor)) static void init_preload(void) ll_pthread_rwlock_unlock = dlsym(RTLD_NEXT, "pthread_rwlock_unlock"); #endif - printf("%p\n", ll_pthread_mutex_trylock);fflush(stdout); - lockdep_init(); __init_state = done; From b10827814e9c81c5a14fb73c5a6e06bd85df3f94 Mon Sep 17 00:00:00 2001 From: "S. Lockwood-Childs" Date: Mon, 7 Jul 2014 00:17:33 -0700 Subject: [PATCH 105/455] tools/liblockdep: Account for bitfield changes in lockdeps lock_acquire Commit fb9edbe984 shortened held_lock->check from a 2-bit field to a 1-bit field. Make liblockdep compatible with the new definition by passing check=1 to lock_acquire() calls, rather than the old value check=2 (which inadvertently disabled checks by overflowing to 0). Without this fix, several of the test cases in liblockdep run_tests.sh were failing. Signed-off-by: S. Lockwood-Childs Signed-off-by: Sasha Levin --- tools/lib/lockdep/include/liblockdep/mutex.h | 4 ++-- tools/lib/lockdep/include/liblockdep/rwlock.h | 8 ++++---- tools/lib/lockdep/preload.c | 16 ++++++++-------- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/tools/lib/lockdep/include/liblockdep/mutex.h b/tools/lib/lockdep/include/liblockdep/mutex.h index c342f7087147..ee53a42818ca 100644 --- a/tools/lib/lockdep/include/liblockdep/mutex.h +++ b/tools/lib/lockdep/include/liblockdep/mutex.h @@ -35,7 +35,7 @@ static inline int __mutex_init(liblockdep_pthread_mutex_t *lock, static inline int liblockdep_pthread_mutex_lock(liblockdep_pthread_mutex_t *lock) { - lock_acquire(&lock->dep_map, 0, 0, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&lock->dep_map, 0, 0, 0, 1, NULL, (unsigned long)_RET_IP_); return pthread_mutex_lock(&lock->mutex); } @@ -47,7 +47,7 @@ static inline int liblockdep_pthread_mutex_unlock(liblockdep_pthread_mutex_t *lo static inline int liblockdep_pthread_mutex_trylock(liblockdep_pthread_mutex_t *lock) { - lock_acquire(&lock->dep_map, 0, 1, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&lock->dep_map, 0, 1, 0, 1, NULL, (unsigned long)_RET_IP_); return pthread_mutex_trylock(&lock->mutex) == 0 ? 1 : 0; } diff --git a/tools/lib/lockdep/include/liblockdep/rwlock.h b/tools/lib/lockdep/include/liblockdep/rwlock.h index a680ab8c2e36..4ec03f861551 100644 --- a/tools/lib/lockdep/include/liblockdep/rwlock.h +++ b/tools/lib/lockdep/include/liblockdep/rwlock.h @@ -36,7 +36,7 @@ static inline int __rwlock_init(liblockdep_pthread_rwlock_t *lock, static inline int liblockdep_pthread_rwlock_rdlock(liblockdep_pthread_rwlock_t *lock) { - lock_acquire(&lock->dep_map, 0, 0, 2, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&lock->dep_map, 0, 0, 2, 1, NULL, (unsigned long)_RET_IP_); return pthread_rwlock_rdlock(&lock->rwlock); } @@ -49,19 +49,19 @@ static inline int liblockdep_pthread_rwlock_unlock(liblockdep_pthread_rwlock_t * static inline int liblockdep_pthread_rwlock_wrlock(liblockdep_pthread_rwlock_t *lock) { - lock_acquire(&lock->dep_map, 0, 0, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&lock->dep_map, 0, 0, 0, 1, NULL, (unsigned long)_RET_IP_); return pthread_rwlock_wrlock(&lock->rwlock); } static inline int liblockdep_pthread_rwlock_tryrdlock(liblockdep_pthread_rwlock_t *lock) { - lock_acquire(&lock->dep_map, 0, 1, 2, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&lock->dep_map, 0, 1, 2, 1, NULL, (unsigned long)_RET_IP_); return pthread_rwlock_tryrdlock(&lock->rwlock) == 0 ? 1 : 0; } static inline int liblockdep_pthread_rwlock_trywlock(liblockdep_pthread_rwlock_t *lock) { - lock_acquire(&lock->dep_map, 0, 1, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&lock->dep_map, 0, 1, 0, 1, NULL, (unsigned long)_RET_IP_); return pthread_rwlock_trywlock(&lock->rwlock) == 0 ? 1 : 0; } diff --git a/tools/lib/lockdep/preload.c b/tools/lib/lockdep/preload.c index 51ec2ec8371b..6f803609e498 100644 --- a/tools/lib/lockdep/preload.c +++ b/tools/lib/lockdep/preload.c @@ -252,7 +252,7 @@ int pthread_mutex_lock(pthread_mutex_t *mutex) try_init_preload(); - lock_acquire(&__get_lock(mutex)->dep_map, 0, 0, 0, 2, NULL, + lock_acquire(&__get_lock(mutex)->dep_map, 0, 0, 0, 1, NULL, (unsigned long)_RET_IP_); /* * Here's the thing with pthread mutexes: unlike the kernel variant, @@ -281,7 +281,7 @@ int pthread_mutex_trylock(pthread_mutex_t *mutex) try_init_preload(); - lock_acquire(&__get_lock(mutex)->dep_map, 0, 1, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(mutex)->dep_map, 0, 1, 0, 1, NULL, (unsigned long)_RET_IP_); r = ll_pthread_mutex_trylock(mutex); if (r) lock_release(&__get_lock(mutex)->dep_map, 0, (unsigned long)_RET_IP_); @@ -303,7 +303,7 @@ int pthread_mutex_unlock(pthread_mutex_t *mutex) */ r = ll_pthread_mutex_unlock(mutex); if (r) - lock_acquire(&__get_lock(mutex)->dep_map, 0, 0, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(mutex)->dep_map, 0, 0, 0, 1, NULL, (unsigned long)_RET_IP_); return r; } @@ -352,7 +352,7 @@ int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock) init_preload(); - lock_acquire(&__get_lock(rwlock)->dep_map, 0, 0, 2, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(rwlock)->dep_map, 0, 0, 2, 1, NULL, (unsigned long)_RET_IP_); r = ll_pthread_rwlock_rdlock(rwlock); if (r) lock_release(&__get_lock(rwlock)->dep_map, 0, (unsigned long)_RET_IP_); @@ -366,7 +366,7 @@ int pthread_rwlock_tryrdlock(pthread_rwlock_t *rwlock) init_preload(); - lock_acquire(&__get_lock(rwlock)->dep_map, 0, 1, 2, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(rwlock)->dep_map, 0, 1, 2, 1, NULL, (unsigned long)_RET_IP_); r = ll_pthread_rwlock_tryrdlock(rwlock); if (r) lock_release(&__get_lock(rwlock)->dep_map, 0, (unsigned long)_RET_IP_); @@ -380,7 +380,7 @@ int pthread_rwlock_trywrlock(pthread_rwlock_t *rwlock) init_preload(); - lock_acquire(&__get_lock(rwlock)->dep_map, 0, 1, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(rwlock)->dep_map, 0, 1, 0, 1, NULL, (unsigned long)_RET_IP_); r = ll_pthread_rwlock_trywrlock(rwlock); if (r) lock_release(&__get_lock(rwlock)->dep_map, 0, (unsigned long)_RET_IP_); @@ -394,7 +394,7 @@ int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock) init_preload(); - lock_acquire(&__get_lock(rwlock)->dep_map, 0, 0, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(rwlock)->dep_map, 0, 0, 0, 1, NULL, (unsigned long)_RET_IP_); r = ll_pthread_rwlock_wrlock(rwlock); if (r) lock_release(&__get_lock(rwlock)->dep_map, 0, (unsigned long)_RET_IP_); @@ -411,7 +411,7 @@ int pthread_rwlock_unlock(pthread_rwlock_t *rwlock) lock_release(&__get_lock(rwlock)->dep_map, 0, (unsigned long)_RET_IP_); r = ll_pthread_rwlock_unlock(rwlock); if (r) - lock_acquire(&__get_lock(rwlock)->dep_map, 0, 0, 0, 2, NULL, (unsigned long)_RET_IP_); + lock_acquire(&__get_lock(rwlock)->dep_map, 0, 0, 0, 1, NULL, (unsigned long)_RET_IP_); return r; } From 2c4db12ec469b9fcdad9f6bfd6fa20e65a563ac5 Mon Sep 17 00:00:00 2001 From: Andrea Merello Date: Sat, 5 Jul 2014 21:07:17 +0200 Subject: [PATCH 106/455] rt2800usb: Don't perform DMA from stack Function rt2800usb_autorun_detect() passes the address of a variable allocated onto the stack to be used for DMA by the USB layer. This has been caught by my debugging-enabled kernel. This patch change things in order to allocate that variable via kmalloc, and it adjusts things to handle the kmalloc failure case, propagating the error. [ 7363.238852] ------------[ cut here ]------------ [ 7363.243529] WARNING: CPU: 1 PID: 5235 at lib/dma-debug.c:1153 check_for_stack+0xa4/0xf0() [ 7363.251759] ehci-pci 0000:00:04.1: DMA-API: device driver maps memory fromstack [addr=ffff88006b81bad4] [ 7363.261210] Modules linked in: rt2800usb(O+) rt2800lib(O) rt2x00usb(O) rt2x00lib(O) rtl818x_pci(O) rtl8187 led_class eeprom_93cx6 mac80211 cfg80211 [last unloaded: rt2x00lib] [ 7363.277143] CPU: 1 PID: 5235 Comm: systemd-udevd Tainted: G O 3.16.0-rc3-wl+ #31 [ 7363.285546] Hardware name: System manufacturer System Product Name/M3N78 PRO, BIOS ASUS M3N78 PRO ACPI BIOS Revision 1402 12/04/2009 [ 7363.297511] 0000000000000009 ffff88006b81b710 ffffffff8175dcad ffff88006b81b758 [ 7363.305062] ffff88006b81b748 ffffffff8106d372 ffff88006cf10098 ffff88006cead6a0 [ 7363.312622] ffff88006b81bad4 ffffffff81c1e7c0 ffff88006cf10098 ffff88006b81b7a8 [ 7363.320161] Call Trace: [ 7363.322661] [] dump_stack+0x4d/0x6f [ 7363.327847] [] warn_slowpath_common+0x82/0xb0 [ 7363.333893] [] warn_slowpath_fmt+0x47/0x50 [ 7363.339686] [] check_for_stack+0xa4/0xf0 [ 7363.345298] [] debug_dma_map_page+0x10c/0x150 [ 7363.351367] [] usb_hcd_map_urb_for_dma+0x229/0x720 [ 7363.357890] [] usb_hcd_submit_urb+0x2fd/0x930 [ 7363.363929] [] ? irq_work_queue+0x71/0xd0 [ 7363.369617] [] ? wake_up_klogd+0x37/0x50 [ 7363.375219] [] ? console_unlock+0x1e5/0x420 [ 7363.381081] [] ? vprintk_emit+0x245/0x530 [ 7363.386773] [] usb_submit_urb+0x30c/0x580 [ 7363.392462] [] usb_start_wait_urb+0x65/0xf0 [ 7363.398325] [] usb_control_msg+0xcd/0x110 [ 7363.404014] [] rt2x00usb_vendor_request+0xbd/0x170 [rt2x00usb] [ 7363.411544] [] rt2800usb_autorun_detect+0x32/0x50 [rt2800usb] [ 7363.418986] [] rt2800usb_read_eeprom+0x11/0x70 [rt2800usb] [ 7363.426168] [] rt2800_probe_hw+0x11d/0xf90 [rt2800lib] [ 7363.432989] [] rt2800usb_probe_hw+0xd/0x50 [rt2800usb] [ 7363.439808] [] rt2x00lib_probe_dev+0x238/0x7c0 [rt2x00lib] [ 7363.446992] [] ? ieee80211_led_names+0xb8/0x100 [mac80211] [ 7363.454156] [] rt2x00usb_probe+0x156/0x1f0 [rt2x00usb] [ 7363.460971] [] rt2800usb_probe+0x10/0x20 [rt2800usb] [ 7363.467616] [] usb_probe_interface+0xce/0x1c0 [ 7363.473651] [] really_probe+0x70/0x240 [ 7363.479079] [] __driver_attach+0xa1/0xb0 [ 7363.484682] [] ? __device_attach+0x70/0x70 [ 7363.490461] [] bus_for_each_dev+0x63/0xa0 [ 7363.496146] [] driver_attach+0x19/0x20 [ 7363.501570] [] bus_add_driver+0x178/0x220 [ 7363.507270] [] driver_register+0x5b/0xe0 [ 7363.512874] [] usb_register_driver+0xa0/0x170 [ 7363.518905] [] ? 0xffffffffa0079fff [ 7363.524074] [] rt2800usb_driver_init+0x1e/0x20 [rt2800usb] [ 7363.531247] [] do_one_initcall+0x84/0x1b0 [ 7363.536932] [] ? kfree+0xd0/0x110 [ 7363.541931] [] ? __vunmap+0xaa/0xf0 [ 7363.547538] [] load_module+0x1aee/0x2040 [ 7363.553141] [] ? store_uevent+0x50/0x50 [ 7363.558676] [] SyS_init_module+0x9e/0xc0 [ 7363.564285] [] system_call_fastpath+0x16/0x1b [ 7363.570338] ---[ end trace 01ef5f822bea9882 ]--- Signed-off-by: Andrea Merello Acked-by: Helmut Schaa Signed-off-by: John W. Linville --- drivers/net/wireless/rt2x00/rt2800usb.c | 28 +++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c index e11dab2216c6..832006b5aab1 100644 --- a/drivers/net/wireless/rt2x00/rt2800usb.c +++ b/drivers/net/wireless/rt2x00/rt2800usb.c @@ -231,9 +231,12 @@ static enum hrtimer_restart rt2800usb_tx_sta_fifo_timeout(struct hrtimer *timer) */ static int rt2800usb_autorun_detect(struct rt2x00_dev *rt2x00dev) { - __le32 reg; + __le32 *reg; u32 fw_mode; + reg = kmalloc(sizeof(*reg), GFP_KERNEL); + if (reg == NULL) + return -ENOMEM; /* cannot use rt2x00usb_register_read here as it uses different * mode (MULTI_READ vs. DEVICE_MODE) and does not pass the * magic value USB_MODE_AUTORUN (0x11) to the device, thus the @@ -241,8 +244,9 @@ static int rt2800usb_autorun_detect(struct rt2x00_dev *rt2x00dev) */ rt2x00usb_vendor_request(rt2x00dev, USB_DEVICE_MODE, USB_VENDOR_REQUEST_IN, 0, USB_MODE_AUTORUN, - ®, sizeof(reg), REGISTER_TIMEOUT_FIRMWARE); - fw_mode = le32_to_cpu(reg); + reg, sizeof(*reg), REGISTER_TIMEOUT_FIRMWARE); + fw_mode = le32_to_cpu(*reg); + kfree(reg); if ((fw_mode & 0x00000003) == 2) return 1; @@ -261,6 +265,7 @@ static int rt2800usb_write_firmware(struct rt2x00_dev *rt2x00dev, int status; u32 offset; u32 length; + int retval; /* * Check which section of the firmware we need. @@ -278,7 +283,10 @@ static int rt2800usb_write_firmware(struct rt2x00_dev *rt2x00dev, /* * Write firmware to device. */ - if (rt2800usb_autorun_detect(rt2x00dev)) { + retval = rt2800usb_autorun_detect(rt2x00dev); + if (retval < 0) + return retval; + if (retval) { rt2x00_info(rt2x00dev, "Firmware loading not required - NIC in AutoRun mode\n"); } else { @@ -763,7 +771,12 @@ static void rt2800usb_fill_rxdone(struct queue_entry *entry, */ static int rt2800usb_efuse_detect(struct rt2x00_dev *rt2x00dev) { - if (rt2800usb_autorun_detect(rt2x00dev)) + int retval; + + retval = rt2800usb_autorun_detect(rt2x00dev); + if (retval < 0) + return retval; + if (retval) return 1; return rt2800_efuse_detect(rt2x00dev); } @@ -772,7 +785,10 @@ static int rt2800usb_read_eeprom(struct rt2x00_dev *rt2x00dev) { int retval; - if (rt2800usb_efuse_detect(rt2x00dev)) + retval = rt2800usb_efuse_detect(rt2x00dev); + if (retval < 0) + return retval; + if (retval) retval = rt2800_read_eeprom_efuse(rt2x00dev); else retval = rt2x00usb_eeprom_read(rt2x00dev, rt2x00dev->eeprom, From 19a44ecff52fd67d77d49fb4d43b289c53cdc392 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Mon, 7 Jul 2014 21:05:33 +0200 Subject: [PATCH 107/455] KVM: PPC: RTAS: Do byte swaps explicitly In commit b59d9d26b we introduced implicit byte swaps for RTAS calls. Unfortunately we messed up and didn't swizzle return values properly. Also the old approach wasn't "sparse" compatible - we were randomly reading __be32 values on an LE system. Let's just do all of the swizzling explicitly with byte swaps right where values get used. That way we can at least catch bugs using sparse. This patch fixes XICS RTAS emulation on little endian hosts for me. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_rtas.c | 65 ++++++++++------------------------ 1 file changed, 18 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c index edb14ba992b3..ef27fbd5d9c5 100644 --- a/arch/powerpc/kvm/book3s_rtas.c +++ b/arch/powerpc/kvm/book3s_rtas.c @@ -23,20 +23,20 @@ static void kvm_rtas_set_xive(struct kvm_vcpu *vcpu, struct rtas_args *args) u32 irq, server, priority; int rc; - if (args->nargs != 3 || args->nret != 1) { + if (be32_to_cpu(args->nargs) != 3 || be32_to_cpu(args->nret) != 1) { rc = -3; goto out; } - irq = args->args[0]; - server = args->args[1]; - priority = args->args[2]; + irq = be32_to_cpu(args->args[0]); + server = be32_to_cpu(args->args[1]); + priority = be32_to_cpu(args->args[2]); rc = kvmppc_xics_set_xive(vcpu->kvm, irq, server, priority); if (rc) rc = -3; out: - args->rets[0] = rc; + args->rets[0] = cpu_to_be32(rc); } static void kvm_rtas_get_xive(struct kvm_vcpu *vcpu, struct rtas_args *args) @@ -44,12 +44,12 @@ static void kvm_rtas_get_xive(struct kvm_vcpu *vcpu, struct rtas_args *args) u32 irq, server, priority; int rc; - if (args->nargs != 1 || args->nret != 3) { + if (be32_to_cpu(args->nargs) != 1 || be32_to_cpu(args->nret) != 3) { rc = -3; goto out; } - irq = args->args[0]; + irq = be32_to_cpu(args->args[0]); server = priority = 0; rc = kvmppc_xics_get_xive(vcpu->kvm, irq, &server, &priority); @@ -58,10 +58,10 @@ static void kvm_rtas_get_xive(struct kvm_vcpu *vcpu, struct rtas_args *args) goto out; } - args->rets[1] = server; - args->rets[2] = priority; + args->rets[1] = cpu_to_be32(server); + args->rets[2] = cpu_to_be32(priority); out: - args->rets[0] = rc; + args->rets[0] = cpu_to_be32(rc); } static void kvm_rtas_int_off(struct kvm_vcpu *vcpu, struct rtas_args *args) @@ -69,18 +69,18 @@ static void kvm_rtas_int_off(struct kvm_vcpu *vcpu, struct rtas_args *args) u32 irq; int rc; - if (args->nargs != 1 || args->nret != 1) { + if (be32_to_cpu(args->nargs) != 1 || be32_to_cpu(args->nret) != 1) { rc = -3; goto out; } - irq = args->args[0]; + irq = be32_to_cpu(args->args[0]); rc = kvmppc_xics_int_off(vcpu->kvm, irq); if (rc) rc = -3; out: - args->rets[0] = rc; + args->rets[0] = cpu_to_be32(rc); } static void kvm_rtas_int_on(struct kvm_vcpu *vcpu, struct rtas_args *args) @@ -88,18 +88,18 @@ static void kvm_rtas_int_on(struct kvm_vcpu *vcpu, struct rtas_args *args) u32 irq; int rc; - if (args->nargs != 1 || args->nret != 1) { + if (be32_to_cpu(args->nargs) != 1 || be32_to_cpu(args->nret) != 1) { rc = -3; goto out; } - irq = args->args[0]; + irq = be32_to_cpu(args->args[0]); rc = kvmppc_xics_int_on(vcpu->kvm, irq); if (rc) rc = -3; out: - args->rets[0] = rc; + args->rets[0] = cpu_to_be32(rc); } #endif /* CONFIG_KVM_XICS */ @@ -205,32 +205,6 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp) return rc; } -static void kvmppc_rtas_swap_endian_in(struct rtas_args *args) -{ -#ifdef __LITTLE_ENDIAN__ - int i; - - args->token = be32_to_cpu(args->token); - args->nargs = be32_to_cpu(args->nargs); - args->nret = be32_to_cpu(args->nret); - for (i = 0; i < args->nargs; i++) - args->args[i] = be32_to_cpu(args->args[i]); -#endif -} - -static void kvmppc_rtas_swap_endian_out(struct rtas_args *args) -{ -#ifdef __LITTLE_ENDIAN__ - int i; - - for (i = 0; i < args->nret; i++) - args->args[i] = cpu_to_be32(args->args[i]); - args->token = cpu_to_be32(args->token); - args->nargs = cpu_to_be32(args->nargs); - args->nret = cpu_to_be32(args->nret); -#endif -} - int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) { struct rtas_token_definition *d; @@ -249,8 +223,6 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) if (rc) goto fail; - kvmppc_rtas_swap_endian_in(&args); - /* * args->rets is a pointer into args->args. Now that we've * copied args we need to fix it up to point into our copy, @@ -258,13 +230,13 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) * value so we can restore it on the way out. */ orig_rets = args.rets; - args.rets = &args.args[args.nargs]; + args.rets = &args.args[be32_to_cpu(args.nargs)]; mutex_lock(&vcpu->kvm->lock); rc = -ENOENT; list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) { - if (d->token == args.token) { + if (d->token == be32_to_cpu(args.token)) { d->handler->handler(vcpu, &args); rc = 0; break; @@ -275,7 +247,6 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) if (rc == 0) { args.rets = orig_rets; - kvmppc_rtas_swap_endian_out(&args); rc = kvm_write_guest(vcpu->kvm, args_phys, &args, sizeof(args)); if (rc) goto fail; From 68b7107b62983f2cff0948292429d5f5999df096 Mon Sep 17 00:00:00 2001 From: Edward Allcutt Date: Mon, 30 Jun 2014 16:16:02 +0100 Subject: [PATCH 108/455] ipv4: icmp: Fix pMTU handling for rare case Some older router implementations still send Fragmentation Needed errors with the Next-Hop MTU field set to zero. This is explicitly described as an eventuality that hosts must deal with by the standard (RFC 1191) since older standards specified that those bits must be zero. Linux had a generic (for all of IPv4) implementation of the algorithm described in the RFC for searching a list of MTU plateaus for a good value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") removed this as part of the changes to remove the routing cache. Subsequently any Fragmentation Needed packet with a zero Next-Hop MTU has been discarded without being passed to the per-protocol handlers or notifying userspace for raw sockets. When there is a router which does not implement RFC 1191 on an MTU limited path then this results in stalled connections since large packets are discarded and the local protocols are not notified so they never attempt to lower the pMTU. One example I have seen is an OpenBSD router terminating IPSec tunnels. It's worth pointing out that this case is distinct from the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU since the commit in question dismissed that as a valid concern. All of the per-protocols handlers implement the simple approach from RFC 1191 of immediately falling back to the minimum value. Although this is sub-optimal it is vastly preferable to connections hanging indefinitely. Remove the Next-Hop MTU != 0 check and allow such packets to follow the normal path. Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") Signed-off-by: Edward Allcutt Signed-off-by: David S. Miller --- net/ipv4/icmp.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 79c3d947a481..42b7bcf8045b 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -739,8 +739,6 @@ static void icmp_unreach(struct sk_buff *skb) /* fall through */ case 0: info = ntohs(icmph->un.frag.mtu); - if (!info) - goto out; } break; case ICMP_SR_FAILED: From 11ef7a8996d5d433c9cd75d80651297eccbf6d42 Mon Sep 17 00:00:00 2001 From: Tom Herbert Date: Mon, 30 Jun 2014 09:50:40 -0700 Subject: [PATCH 109/455] net: Performance fix for process_backlog In process_backlog the input_pkt_queue is only checked once for new packets and quota is artificially reduced to reflect precisely the number of packets on the input_pkt_queue so that the loop exits appropriately. This patches changes the behavior to be more straightforward and less convoluted. Packets are processed until either the quota is met or there are no more packets to process. This patch seems to provide a small, but noticeable performance improvement. The performance improvement is a result of staying in the process_backlog loop longer which can reduce number of IPI's. Performance data using super_netperf TCP_RR with 200 flows: Before fix: 88.06% CPU utilization 125/190/309 90/95/99% latencies 1.46808e+06 tps 1145382 intrs.sec. With fix: 87.73% CPU utilization 122/183/296 90/95/99% latencies 1.4921e+06 tps 1021674.30 intrs./sec. Signed-off-by: Tom Herbert Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/dev.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 30eedf677913..77c19c7bb490 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4227,9 +4227,8 @@ static int process_backlog(struct napi_struct *napi, int quota) #endif napi->weight = weight_p; local_irq_disable(); - while (work < quota) { + while (1) { struct sk_buff *skb; - unsigned int qlen; while ((skb = __skb_dequeue(&sd->process_queue))) { local_irq_enable(); @@ -4243,24 +4242,24 @@ static int process_backlog(struct napi_struct *napi, int quota) } rps_lock(sd); - qlen = skb_queue_len(&sd->input_pkt_queue); - if (qlen) - skb_queue_splice_tail_init(&sd->input_pkt_queue, - &sd->process_queue); - - if (qlen < quota - work) { + if (skb_queue_empty(&sd->input_pkt_queue)) { /* * Inline a custom version of __napi_complete(). * only current cpu owns and manipulates this napi, - * and NAPI_STATE_SCHED is the only possible flag set on backlog. - * we can use a plain write instead of clear_bit(), + * and NAPI_STATE_SCHED is the only possible flag set + * on backlog. + * We can use a plain write instead of clear_bit(), * and we dont need an smp_mb() memory barrier. */ list_del(&napi->poll_list); napi->state = 0; + rps_unlock(sd); - quota = work + qlen; + break; } + + skb_queue_splice_tail_init(&sd->input_pkt_queue, + &sd->process_queue); rps_unlock(sd); } local_irq_enable(); From 8844a00626c78c30e8cd757f2a048038f4983c53 Mon Sep 17 00:00:00 2001 From: Zhao Qiang Date: Tue, 1 Jul 2014 09:21:34 +0800 Subject: [PATCH 110/455] powerpc/ucc_geth: deal with a compile warning deal with a compile warning: comparison between 'enum qe_fltr_largest_external_tbl_lookup_key_size' and 'enum qe_fltr_tbl_lookup_key_size' the code: "if (ug_info->largestexternallookupkeysize == QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES)" is warned because different enum, so modify it. "enum qe_fltr_largest_external_tbl_lookup_key_size largestexternallookupkeysize; enum qe_fltr_tbl_lookup_key_size { QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES = 0x3f, /* LookupKey parsed by the Generate LookupKey CMD is truncated to 8 bytes */ QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES = 0x5f, /* LookupKey parsed by the Generate LookupKey CMD is truncated to 16 bytes */ }; /* QE FLTR extended filtering Largest External Table Lookup Key Size */ enum qe_fltr_largest_external_tbl_lookup_key_size { QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_NONE = 0x0,/* not used */ QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_8_BYTES = QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES, /* 8 bytes */ QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_16_BYTES = QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES, /* 16 bytes */ };" Signed-off-by: Zhao Qiang Signed-off-by: David S. Miller --- drivers/net/ethernet/freescale/ucc_geth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c index fab39e295441..36fc429298e3 100644 --- a/drivers/net/ethernet/freescale/ucc_geth.c +++ b/drivers/net/ethernet/freescale/ucc_geth.c @@ -2990,11 +2990,11 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth) if (ug_info->rxExtendedFiltering) { size += THREAD_RX_PRAM_ADDITIONAL_FOR_EXTENDED_FILTERING; if (ug_info->largestexternallookupkeysize == - QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES) + QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_8_BYTES) size += THREAD_RX_PRAM_ADDITIONAL_FOR_EXTENDED_FILTERING_8; if (ug_info->largestexternallookupkeysize == - QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES) + QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_16_BYTES) size += THREAD_RX_PRAM_ADDITIONAL_FOR_EXTENDED_FILTERING_16; } From 26a9ebca982d52b96c921f7344b7c01b1e2d9672 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Mon, 7 Jul 2014 19:53:45 -0700 Subject: [PATCH 111/455] Revert "net: stmmac: add platform init/exit for Altera's ARM socfpga" This reverts commit 0acf16768740776feffac506ce93b1c06c059ac6. Breaks the build due to missing reference to phy_resume in the resulting dwmac-socfpga.o object. Signed-off-by: David S. Miller --- .../ethernet/stmicro/stmmac/dwmac-socfpga.c | 69 ------------------- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 4 -- 2 files changed, 73 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c index ec632e666c56..fd8a217556a1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c @@ -20,9 +20,7 @@ #include #include #include -#include #include -#include "stmmac.h" #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII 0x0 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII 0x1 @@ -36,7 +34,6 @@ struct socfpga_dwmac { u32 reg_shift; struct device *dev; struct regmap *sys_mgr_base_addr; - struct reset_control *stmmac_rst; }; static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device *dev) @@ -46,13 +43,6 @@ static int socfpga_dwmac_parse_data(struct socfpga_dwmac *dwmac, struct device * u32 reg_offset, reg_shift; int ret; - dwmac->stmmac_rst = devm_reset_control_get(dev, - STMMAC_RESOURCE_NAME); - if (IS_ERR(dwmac->stmmac_rst)) { - dev_info(dev, "Could not get reset control!\n"); - return -EINVAL; - } - dwmac->interface = of_get_phy_mode(np); sys_mgr_base_addr = syscon_regmap_lookup_by_phandle(np, "altr,sysmgr-syscon"); @@ -135,65 +125,6 @@ static void *socfpga_dwmac_probe(struct platform_device *pdev) return dwmac; } -static void socfpga_dwmac_exit(struct platform_device *pdev, void *priv) -{ - struct socfpga_dwmac *dwmac = priv; - - /* On socfpga platform exit, assert and hold reset to the - * enet controller - the default state after a hard reset. - */ - if (dwmac->stmmac_rst) - reset_control_assert(dwmac->stmmac_rst); -} - -static int socfpga_dwmac_init(struct platform_device *pdev, void *priv) -{ - struct socfpga_dwmac *dwmac = priv; - struct net_device *ndev = platform_get_drvdata(pdev); - struct stmmac_priv *stpriv = NULL; - int ret = 0; - - if (ndev) - stpriv = netdev_priv(ndev); - - /* Assert reset to the enet controller before changing the phy mode */ - if (dwmac->stmmac_rst) - reset_control_assert(dwmac->stmmac_rst); - - /* Setup the phy mode in the system manager registers according to - * devicetree configuration - */ - ret = socfpga_dwmac_setup(dwmac); - - /* Deassert reset for the phy configuration to be sampled by - * the enet controller, and operation to start in requested mode - */ - if (dwmac->stmmac_rst) - reset_control_deassert(dwmac->stmmac_rst); - - /* Before the enet controller is suspended, the phy is suspended. - * This causes the phy clock to be gated. The enet controller is - * resumed before the phy, so the clock is still gated "off" when - * the enet controller is resumed. This code makes sure the phy - * is "resumed" before reinitializing the enet controller since - * the enet controller depends on an active phy clock to complete - * a DMA reset. A DMA reset will "time out" if executed - * with no phy clock input on the Synopsys enet controller. - * Verified through Synopsys Case #8000711656. - * - * Note that the phy clock is also gated when the phy is isolated. - * Phy "suspend" and "isolate" controls are located in phy basic - * control register 0, and can be modified by the phy driver - * framework. - */ - if (stpriv && stpriv->phydev) - phy_resume(stpriv->phydev); - - return ret; -} - const struct stmmac_of_data socfpga_gmac_data = { .setup = socfpga_dwmac_probe, - .init = socfpga_dwmac_init, - .exit = socfpga_dwmac_exit, }; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 18315f34938b..057a1208e594 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2878,10 +2878,6 @@ int stmmac_suspend(struct net_device *ndev) clk_disable_unprepare(priv->stmmac_clk); } spin_unlock_irqrestore(&priv->lock, flags); - - priv->oldlink = 0; - priv->speed = 0; - priv->oldduplex = -1; return 0; } From 8dcb4b1526747d8431f9895e153dd478c9d16186 Mon Sep 17 00:00:00 2001 From: Bernd Wachter Date: Tue, 1 Jul 2014 22:01:09 +0300 Subject: [PATCH 112/455] net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There's a new version of the Telewell 4G modem working with, but not recognized by this driver. Signed-off-by: Bernd Wachter Acked-by: Bjørn Mork Signed-off-by: David S. Miller --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index cf62d7e8329f..c4638c67f6b9 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -741,6 +741,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x19d2, 0x1424, 2)}, {QMI_FIXED_INTF(0x19d2, 0x1425, 2)}, {QMI_FIXED_INTF(0x19d2, 0x1426, 2)}, /* ZTE MF91 */ + {QMI_FIXED_INTF(0x19d2, 0x1428, 2)}, /* Telewell TW-LTE 4G v2 */ {QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */ From 54951194656e4853e441266fd095f880bc0398f3 Mon Sep 17 00:00:00 2001 From: Loic Prylli Date: Tue, 1 Jul 2014 21:39:43 -0700 Subject: [PATCH 113/455] net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush A bug was introduced in NETDEV_CHANGE notifier sequence causing the arp table to be sometimes spuriously cleared (including manual arp entries marked permanent), upon network link carrier changes. The changed argument for the notifier was applied only to a single caller of NETDEV_CHANGE, missing among others netdev_state_change(). So upon net_carrier events induced by the network, which are triggering a call to netdev_state_change(), arp_netdev_event() would decide whether to clear or not arp cache based on random/junk stack values (a kind of read buffer overflow). Fixes: be9efd365328 ("net: pass changed flags along with NETDEV_CHANGE event") Fixes: 6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change") Signed-off-by: Loic Prylli Signed-off-by: David S. Miller --- net/core/dev.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 77c19c7bb490..7990984ca364 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -148,6 +148,9 @@ struct list_head ptype_all __read_mostly; /* Taps */ static struct list_head offload_base __read_mostly; static int netif_rx_internal(struct sk_buff *skb); +static int call_netdevice_notifiers_info(unsigned long val, + struct net_device *dev, + struct netdev_notifier_info *info); /* * The @dev_base_head list is protected by @dev_base_lock and the rtnl @@ -1207,7 +1210,11 @@ EXPORT_SYMBOL(netdev_features_change); void netdev_state_change(struct net_device *dev) { if (dev->flags & IFF_UP) { - call_netdevice_notifiers(NETDEV_CHANGE, dev); + struct netdev_notifier_change_info change_info; + + change_info.flags_changed = 0; + call_netdevice_notifiers_info(NETDEV_CHANGE, dev, + &change_info.info); rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL); } } From 52ad353a5344f1f700c5b777175bdfa41d3cd65a Mon Sep 17 00:00:00 2001 From: dingtianhong Date: Wed, 2 Jul 2014 13:50:48 +0800 Subject: [PATCH 114/455] igmp: fix the problem when mc leave group The problem was triggered by these steps: 1) create socket, bind and then setsockopt for add mc group. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("192.168.1.2"); setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); 2) drop the mc group for this socket. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("0.0.0.0"); setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); 3) and then drop the socket, I found the mc group was still used by the dev: netstat -g Interface RefCnt Group --------------- ------ --------------------- eth2 1 255.0.0.37 Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need to be released for the netdev when drop the socket, but this process was broken when route default is NULL, the reason is that: The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr is NULL, the default route dev will be chosen, then the ifindex is got from the dev, then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL, the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be released from the mc_list, but the dev didn't dec the refcnt for this mc group, so when dropping the socket, the mc_list is NULL and the dev still keep this group. v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs, so I add the checking for the in_dev before polling the mc_list, make sure when we remove the mc group, dec the refcnt to the real dev which was using the mc address. The problem would never happened again. Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller --- net/ipv4/igmp.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 6748d420f714..db710b059bab 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1944,6 +1944,10 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) rtnl_lock(); in_dev = ip_mc_find_dev(net, imr); + if (!in_dev) { + ret = -ENODEV; + goto out; + } ifindex = imr->imr_ifindex; for (imlp = &inet->mc_list; (iml = rtnl_dereference(*imlp)) != NULL; @@ -1961,16 +1965,14 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) *imlp = iml->next_rcu; - if (in_dev) - ip_mc_dec_group(in_dev, group); + ip_mc_dec_group(in_dev, group); rtnl_unlock(); /* decrease mem now to avoid the memleak warning */ atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); kfree_rcu(iml, rcu); return 0; } - if (!in_dev) - ret = -ENODEV; +out: rtnl_unlock(); return ret; } From e326f2f13b209d56782609e833b87cb497e64b3b Mon Sep 17 00:00:00 2001 From: Or Gerlitz Date: Wed, 2 Jul 2014 17:36:23 +0300 Subject: [PATCH 115/455] net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set The add_vxlan_port ndo driver code was wrongly testing whether HW vxlan offloads are supported by the device instead of checking if they are currently enabled. This causes the driver to configure the HW parser to conduct matching for vxlan packets but since no steering rules were set, vxlan packets are dropped on RX. Fix that by doing the right test, as done in the del_vxlan_port ndo handler. Fixes: 1b136de ('net/mlx4: Implement vxlan ndo calls') Signed-off-by: Or Gerlitz Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 7d4fb7bf2593..e94d96f0ef55 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2336,7 +2336,7 @@ static void mlx4_en_add_vxlan_port(struct net_device *dev, struct mlx4_en_priv *priv = netdev_priv(dev); __be16 current_port; - if (!(priv->mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_VXLAN_OFFLOADS)) + if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) return; if (sa_family == AF_INET6) From 6e08d5e3c8236e7484229e46fdf92006e1dd4c49 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Wed, 2 Jul 2014 12:07:16 -0700 Subject: [PATCH 116/455] tcp: fix false undo corner cases The undo code assumes that, upon entering loss recovery, TCP 1) always retransmit something 2) the retransmission never fails locally (e.g., qdisc drop) so undo_marker is set in tcp_enter_recovery() and undo_retrans is incremented only when tcp_retransmit_skb() is successful. When the assumption is broken because TCP's cwnd is too small to retransmit or the retransmit fails locally. The next (DUP)ACK would incorrectly revert the cwnd and the congestion state in tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs may enter the recovery state. The sender repeatedly enter and (incorrectly) exit recovery states if the retransmits continue to fail locally while receiving (DUP)ACKs. The fix is to initialize undo_retrans to -1 and start counting on the first retransmission. Always increment undo_retrans even if the retransmissions fail locally because they couldn't cause DSACKs to undo the cwnd reduction. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: David S. Miller --- net/ipv4/tcp_input.c | 8 ++++---- net/ipv4/tcp_output.c | 6 ++++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b5c23756965a..40639c288dc2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1106,7 +1106,7 @@ static bool tcp_check_dsack(struct sock *sk, const struct sk_buff *ack_skb, } /* D-SACK for already forgotten data... Do dumb counting. */ - if (dup_sack && tp->undo_marker && tp->undo_retrans && + if (dup_sack && tp->undo_marker && tp->undo_retrans > 0 && !after(end_seq_0, prior_snd_una) && after(end_seq_0, tp->undo_marker)) tp->undo_retrans--; @@ -1187,7 +1187,7 @@ static u8 tcp_sacktag_one(struct sock *sk, /* Account D-SACK for retransmitted packet. */ if (dup_sack && (sacked & TCPCB_RETRANS)) { - if (tp->undo_marker && tp->undo_retrans && + if (tp->undo_marker && tp->undo_retrans > 0 && after(end_seq, tp->undo_marker)) tp->undo_retrans--; if (sacked & TCPCB_SACKED_ACKED) @@ -1893,7 +1893,7 @@ static void tcp_clear_retrans_partial(struct tcp_sock *tp) tp->lost_out = 0; tp->undo_marker = 0; - tp->undo_retrans = 0; + tp->undo_retrans = -1; } void tcp_clear_retrans(struct tcp_sock *tp) @@ -2665,7 +2665,7 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack) tp->prior_ssthresh = 0; tp->undo_marker = tp->snd_una; - tp->undo_retrans = tp->retrans_out; + tp->undo_retrans = tp->retrans_out ? : -1; if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) { if (!ece_ack) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d92bce0ea24e..179b51e6bda3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2525,8 +2525,6 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) if (!tp->retrans_stamp) tp->retrans_stamp = TCP_SKB_CB(skb)->when; - tp->undo_retrans += tcp_skb_pcount(skb); - /* snd_nxt is stored to detect loss of retransmitted segment, * see tcp_input.c tcp_sacktag_write_queue(). */ @@ -2534,6 +2532,10 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) } else if (err != -EBUSY) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRETRANSFAIL); } + + if (tp->undo_retrans < 0) + tp->undo_retrans = 0; + tp->undo_retrans += tcp_skb_pcount(skb); return err; } From 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3 Mon Sep 17 00:00:00 2001 From: Bandan Das Date: Tue, 8 Jul 2014 00:30:23 -0400 Subject: [PATCH 117/455] KVM: x86: Check for nested events if there is an injectable interrupt With commit b6b8a1451fc40412c57d1 that introduced vmx_check_nested_events, checks for injectable interrupts happen at different points in time for L1 and L2 that could potentially cause a race. The regression occurs because KVM_REQ_EVENT is always set when nested_run_pending is set even if there's no pending interrupt. Consequently, there could be a small window when check_nested_events returns without exiting to L1, but an interrupt comes through soon after and it incorrectly, gets injected to L2 by inject_pending_event Fix this by adding a call to check for nested events too when a check for injectable interrupt returns true Signed-off-by: Bandan Das Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f6449334ec45..ef432f891d30 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5887,6 +5887,18 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win) kvm_x86_ops->set_nmi(vcpu); } } else if (kvm_cpu_has_injectable_intr(vcpu)) { + /* + * Because interrupts can be injected asynchronously, we are + * calling check_nested_events again here to avoid a race condition. + * See https://lkml.org/lkml/2014/7/2/60 for discussion about this + * proposal and current concerns. Perhaps we should be setting + * KVM_REQ_EVENT only on certain events and not unconditionally? + */ + if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) { + r = kvm_x86_ops->check_nested_events(vcpu, req_int_win); + if (r != 0) + return r; + } if (kvm_x86_ops->interrupt_allowed(vcpu)) { kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), false); From 16927776ae757d0d132bdbfabbfe2c498342bd59 Mon Sep 17 00:00:00 2001 From: John Stultz Date: Mon, 7 Jul 2014 14:06:11 -0700 Subject: [PATCH 118/455] alarmtimer: Fix bug where relative alarm timers were treated as absolute Sharvil noticed with the posix timer_settime interface, using the CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users tried to specify a relative time timer, it would incorrectly be treated as absolute regardless of the state of the flags argument. This patch corrects this, properly checking the absolute/relative flag, as well as adds further error checking that no invalid flag bits are set. Reported-by: Sharvil Nanavati Signed-off-by: John Stultz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Prarit Bhargava Cc: Sharvil Nanavati Cc: stable #3.0+ Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner --- kernel/time/alarmtimer.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 88c9c65a430d..fe75444ae7ec 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -585,9 +585,14 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, struct itimerspec *new_setting, struct itimerspec *old_setting) { + ktime_t exp; + if (!rtcdev) return -ENOTSUPP; + if (flags & ~TIMER_ABSTIME) + return -EINVAL; + if (old_setting) alarm_timer_get(timr, old_setting); @@ -597,8 +602,16 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, /* start the timer */ timr->it.alarm.interval = timespec_to_ktime(new_setting->it_interval); - alarm_start(&timr->it.alarm.alarmtimer, - timespec_to_ktime(new_setting->it_value)); + exp = timespec_to_ktime(new_setting->it_value); + /* Convert (if necessary) to absolute time */ + if (flags != TIMER_ABSTIME) { + ktime_t now; + + now = alarm_bases[timr->it.alarm.alarmtimer.type].gettime(); + exp = ktime_add(now, exp); + } + + alarm_start(&timr->it.alarm.alarmtimer, exp); return 0; } @@ -730,6 +743,9 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags, if (!alarmtimer_get_rtcdev()) return -ENOTSUPP; + if (flags & ~TIMER_ABSTIME) + return -EINVAL; + if (!capable(CAP_WAKE_ALARM)) return -EPERM; From 91947d8cc553b3147140334a295218499b77ea92 Mon Sep 17 00:00:00 2001 From: Aaron Plattner Date: Tue, 8 Jul 2014 00:21:38 -0700 Subject: [PATCH 119/455] ALSA: hda - Add new GPU codec ID 0x10de0070 to snd-hda Vendor ID 0x10de0070 is used by a yet-to-be-named GPU chip. Signed-off-by: Aaron Plattner Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_hdmi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index 4fe876b65fda..ba4ca52072ff 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -3337,6 +3337,7 @@ static const struct hda_codec_preset snd_hda_preset_hdmi[] = { { .id = 0x10de0051, .name = "GPU 51 HDMI/DP", .patch = patch_nvhdmi }, { .id = 0x10de0060, .name = "GPU 60 HDMI/DP", .patch = patch_nvhdmi }, { .id = 0x10de0067, .name = "MCP67 HDMI", .patch = patch_nvhdmi_2ch }, +{ .id = 0x10de0070, .name = "GPU 70 HDMI/DP", .patch = patch_nvhdmi }, { .id = 0x10de0071, .name = "GPU 71 HDMI/DP", .patch = patch_nvhdmi }, { .id = 0x10de8001, .name = "MCP73 HDMI", .patch = patch_nvhdmi_2ch }, { .id = 0x11069f80, .name = "VX900 HDMI/DP", .patch = patch_via_hdmi }, @@ -3394,6 +3395,7 @@ MODULE_ALIAS("snd-hda-codec-id:10de0044"); MODULE_ALIAS("snd-hda-codec-id:10de0051"); MODULE_ALIAS("snd-hda-codec-id:10de0060"); MODULE_ALIAS("snd-hda-codec-id:10de0067"); +MODULE_ALIAS("snd-hda-codec-id:10de0070"); MODULE_ALIAS("snd-hda-codec-id:10de0071"); MODULE_ALIAS("snd-hda-codec-id:10de8001"); MODULE_ALIAS("snd-hda-codec-id:11069f80"); From d45b3279a5a2252cafcd665bbf2db8c9b31ef783 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 8 Jul 2014 12:25:28 +0200 Subject: [PATCH 120/455] block: don't assume last put of shared tags is for the host There is no inherent reason why the last put of a tag structure must be the one for the Scsi_Host, as device model objects can be held for arbitrary periods. Merge blk_free_tags and __blk_free_tags into a single funtion that just release a references and get rid of the BUG() when the host reference wasn't the last. Signed-off-by: Christoph Hellwig Cc: stable@kernel.org Signed-off-by: Jens Axboe --- block/blk-tag.c | 33 +++++++-------------------------- 1 file changed, 7 insertions(+), 26 deletions(-) diff --git a/block/blk-tag.c b/block/blk-tag.c index 3f33d8672268..a185b86741e5 100644 --- a/block/blk-tag.c +++ b/block/blk-tag.c @@ -27,18 +27,15 @@ struct request *blk_queue_find_tag(struct request_queue *q, int tag) EXPORT_SYMBOL(blk_queue_find_tag); /** - * __blk_free_tags - release a given set of tag maintenance info + * blk_free_tags - release a given set of tag maintenance info * @bqt: the tag map to free * - * Tries to free the specified @bqt. Returns true if it was - * actually freed and false if there are still references using it + * Drop the reference count on @bqt and frees it when the last reference + * is dropped. */ -static int __blk_free_tags(struct blk_queue_tag *bqt) +void blk_free_tags(struct blk_queue_tag *bqt) { - int retval; - - retval = atomic_dec_and_test(&bqt->refcnt); - if (retval) { + if (atomic_dec_and_test(&bqt->refcnt)) { BUG_ON(find_first_bit(bqt->tag_map, bqt->max_depth) < bqt->max_depth); @@ -50,9 +47,8 @@ static int __blk_free_tags(struct blk_queue_tag *bqt) kfree(bqt); } - - return retval; } +EXPORT_SYMBOL(blk_free_tags); /** * __blk_queue_free_tags - release tag maintenance info @@ -69,27 +65,12 @@ void __blk_queue_free_tags(struct request_queue *q) if (!bqt) return; - __blk_free_tags(bqt); + blk_free_tags(bqt); q->queue_tags = NULL; queue_flag_clear_unlocked(QUEUE_FLAG_QUEUED, q); } -/** - * blk_free_tags - release a given set of tag maintenance info - * @bqt: the tag map to free - * - * For externally managed @bqt frees the map. Callers of this - * function must guarantee to have released all the queues that - * might have been using this tag map. - */ -void blk_free_tags(struct blk_queue_tag *bqt) -{ - if (unlikely(!__blk_free_tags(bqt))) - BUG(); -} -EXPORT_SYMBOL(blk_free_tags); - /** * blk_queue_free_tags - release tag maintenance info * @q: the request queue for the device From 0d461e1b087048b0cc37c9d7b351649578c507b4 Mon Sep 17 00:00:00 2001 From: Gregory CLEMENT Date: Fri, 4 Jul 2014 16:22:16 +0200 Subject: [PATCH 121/455] ARM: mvebu: Fix the operand list in the inline asm of armada_370_xp_pmsu_idle_enter In the inline asm part of the function armada_370_xp_pmsu_idle_enter() the input operand was used. The intent here was to let the compiler choose this register so it could do the optimization it needed. However an input operand is not supposed to be modified by the inline asm code. This can lead to improper generated instructions. In some case generated instruction the compiler made the choice to reuse the same register to store the return value. But in the assembly part this register was modified, so it can lead to return an wrong value. The fix is to use a clobber. Thanks to this the compiler will know that the value of this register will be modified. Signed-off-by: Gregory CLEMENT Reviewed-by: Thomas Petazzoni Link: https://lkml.kernel.org/r/1404483736-16938-1-git-send-email-gregory.clement@free-electrons.com Signed-off-by: Jason Cooper --- arch/arm/mach-mvebu/pmsu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm/mach-mvebu/pmsu.c b/arch/arm/mach-mvebu/pmsu.c index a1d407c0febe..25aa8237d668 100644 --- a/arch/arm/mach-mvebu/pmsu.c +++ b/arch/arm/mach-mvebu/pmsu.c @@ -201,12 +201,12 @@ static noinline int do_armada_370_xp_cpu_suspend(unsigned long deepidle) /* Test the CR_C bit and set it if it was cleared */ asm volatile( - "mrc p15, 0, %0, c1, c0, 0 \n\t" - "tst %0, #(1 << 2) \n\t" - "orreq %0, %0, #(1 << 2) \n\t" - "mcreq p15, 0, %0, c1, c0, 0 \n\t" + "mrc p15, 0, r0, c1, c0, 0 \n\t" + "tst r0, #(1 << 2) \n\t" + "orreq r0, r0, #(1 << 2) \n\t" + "mcreq p15, 0, r0, c1, c0, 0 \n\t" "isb " - : : "r" (0)); + : : : "r0"); pr_warn("Failed to suspend the system\n"); From a728b977429383b3fe92b6e3bff9e69365609e0f Mon Sep 17 00:00:00 2001 From: Ezequiel Garcia Date: Tue, 8 Jul 2014 10:37:37 -0300 Subject: [PATCH 122/455] ARM: mvebu: Fix coherency bus notifiers by using separate notifiers Currently, the coherency fabric support registers two bus notifiers; one for platform, one for pci bus types, with the same notifier block. However, this is illegal and can cause serious issues: the notifier block is also a link in the notifier list and cannot be inserted twice. This commit fixes this by using different notifier blocks (with the same notifier callback) to set the platform and pci bus types notifiers. Fixes: b0063aad5dd8 ("ARM: mvebu: use hardware I/O coherency also for PCI devices") Reported-by: Paolo Pisati Signed-off-by: Ezequiel Garcia Link: https://lkml.kernel.org/r/1404826657-6977-1-git-send-email-ezequiel.garcia@free-electrons.com Signed-off-by: Jason Cooper --- arch/arm/mach-mvebu/coherency.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c index 477202fd39cc..2bdc3233abe2 100644 --- a/arch/arm/mach-mvebu/coherency.c +++ b/arch/arm/mach-mvebu/coherency.c @@ -292,6 +292,10 @@ static struct notifier_block mvebu_hwcc_nb = { .notifier_call = mvebu_hwcc_notifier, }; +static struct notifier_block mvebu_hwcc_pci_nb = { + .notifier_call = mvebu_hwcc_notifier, +}; + static void __init armada_370_coherency_init(struct device_node *np) { struct resource res; @@ -427,7 +431,7 @@ static int __init coherency_pci_init(void) { if (coherency_available()) bus_register_notifier(&pci_bus_type, - &mvebu_hwcc_nb); + &mvebu_hwcc_pci_nb); return 0; } From f50b407653f64e76d1c9abda61d0d85cde3ca9ca Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Wed, 2 Jul 2014 16:09:14 +0100 Subject: [PATCH 123/455] xen-netfront: don't nest queue locks in xennet_connect() The nesting of the per-queue rx_lock and tx_lock in xennet_connect() is confusing to both humans and lockdep. The locking is safe because this is the only place where the locks are nested in this way but lockdep still warns. Instead of adding the missing lockdep annotations, refactor the locking to avoid the confusing nesting. This is still safe, because the xenbus connection state changes are all serialized by the xenwatch thread. Signed-off-by: David Vrabel Reported-by: Sander Eikelenboom Signed-off-by: David S. Miller --- drivers/net/xen-netfront.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 2ccb4a02368b..6a37d62de40b 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -2046,13 +2046,15 @@ static int xennet_connect(struct net_device *dev) /* By now, the queue structures have been set up */ for (j = 0; j < num_queues; ++j) { queue = &np->queues[j]; - spin_lock_bh(&queue->rx_lock); - spin_lock_irq(&queue->tx_lock); /* Step 1: Discard all pending TX packet fragments. */ + spin_lock_irq(&queue->tx_lock); xennet_release_tx_bufs(queue); + spin_unlock_irq(&queue->tx_lock); /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ + spin_lock_bh(&queue->rx_lock); + for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) { skb_frag_t *frag; const struct page *page; @@ -2076,6 +2078,8 @@ static int xennet_connect(struct net_device *dev) } queue->rx.req_prod_pvt = requeue_idx; + + spin_unlock_bh(&queue->rx_lock); } /* @@ -2087,13 +2091,17 @@ static int xennet_connect(struct net_device *dev) netif_carrier_on(np->netdev); for (j = 0; j < num_queues; ++j) { queue = &np->queues[j]; + notify_remote_via_irq(queue->tx_irq); if (queue->tx_irq != queue->rx_irq) notify_remote_via_irq(queue->rx_irq); - xennet_tx_buf_gc(queue); - xennet_alloc_rx_buffers(queue); + spin_lock_irq(&queue->tx_lock); + xennet_tx_buf_gc(queue); spin_unlock_irq(&queue->tx_lock); + + spin_lock_bh(&queue->rx_lock); + xennet_alloc_rx_buffers(queue); spin_unlock_bh(&queue->rx_lock); } From f9feb1e6a25f9e197f9e6e6cb04bf04d2cccff93 Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Wed, 2 Jul 2014 16:09:15 +0100 Subject: [PATCH 124/455] xen-netfront: call netif_carrier_off() only once when disconnecting In xennet_disconnect_backend(), netif_carrier_off() was called once per queue when it needs to only be called once. The queue locking around the netif_carrier_off() call looked very odd. I think they were supposed to synchronize any NAPI instances with the expectation that no further NAPI instances would be scheduled because of the carrier being off (see the check in xennet_rx_interrupt()). But I can't easily tell if this works correctly. Instead, add a napi_synchronize() call after disabling the interrupts. This is obviously correct as with no Rx interrupts, no further NAPI instances will be scheduled. Signed-off-by: David Vrabel Signed-off-by: David S. Miller --- drivers/net/xen-netfront.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 6a37d62de40b..055222bae6e4 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1439,16 +1439,11 @@ static void xennet_disconnect_backend(struct netfront_info *info) unsigned int i = 0; unsigned int num_queues = info->netdev->real_num_tx_queues; + netif_carrier_off(info->netdev); + for (i = 0; i < num_queues; ++i) { struct netfront_queue *queue = &info->queues[i]; - /* Stop old i/f to prevent errors whilst we rebuild the state. */ - spin_lock_bh(&queue->rx_lock); - spin_lock_irq(&queue->tx_lock); - netif_carrier_off(queue->info->netdev); - spin_unlock_irq(&queue->tx_lock); - spin_unlock_bh(&queue->rx_lock); - if (queue->tx_irq && (queue->tx_irq == queue->rx_irq)) unbind_from_irqhandler(queue->tx_irq, queue); if (queue->tx_irq && (queue->tx_irq != queue->rx_irq)) { @@ -1458,6 +1453,8 @@ static void xennet_disconnect_backend(struct netfront_info *info) queue->tx_evtchn = queue->rx_evtchn = 0; queue->tx_irq = queue->rx_irq = 0; + napi_synchronize(&queue->napi); + /* End access and free the pages */ xennet_end_access(queue->tx_ring_ref, queue->tx.sring); xennet_end_access(queue->rx_ring_ref, queue->rx.sring); From 74adf83f5d7720925499b4938f930591f947b660 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 18 Jun 2014 11:07:03 +0200 Subject: [PATCH 125/455] nfs: only show Posix ACLs in listxattr if actually present The big ACL switched nfs to use generic_listxattr, which calls all existing ->list handlers. Add a custom .listxattr implementation that only lists the ACLs if they actually are present on the given inode. Signed-off-by: Christoph Hellwig Reported-by: Philippe Troin Tested-by: Philippe Troin Fixes: 013cdf1088d7 (nfs: use generic posix ACL infrastructure ...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust --- fs/nfs/nfs3acl.c | 43 +++++++++++++++++++++++++++++++++++++++++++ fs/nfs/nfs3proc.c | 4 ++-- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs3acl.c b/fs/nfs/nfs3acl.c index 871d6eda8dba..8f854dde4150 100644 --- a/fs/nfs/nfs3acl.c +++ b/fs/nfs/nfs3acl.c @@ -247,3 +247,46 @@ const struct xattr_handler *nfs3_xattr_handlers[] = { &posix_acl_default_xattr_handler, NULL, }; + +static int +nfs3_list_one_acl(struct inode *inode, int type, const char *name, void *data, + size_t size, ssize_t *result) +{ + struct posix_acl *acl; + char *p = data + *result; + + acl = get_acl(inode, type); + if (!acl) + return 0; + + posix_acl_release(acl); + + *result += strlen(name); + *result += 1; + if (!size) + return 0; + if (*result > size) + return -ERANGE; + + strcpy(p, name); + return 0; +} + +ssize_t +nfs3_listxattr(struct dentry *dentry, char *data, size_t size) +{ + struct inode *inode = dentry->d_inode; + ssize_t result = 0; + int error; + + error = nfs3_list_one_acl(inode, ACL_TYPE_ACCESS, + POSIX_ACL_XATTR_ACCESS, data, size, &result); + if (error) + return error; + + error = nfs3_list_one_acl(inode, ACL_TYPE_DEFAULT, + POSIX_ACL_XATTR_DEFAULT, data, size, &result); + if (error) + return error; + return result; +} diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index e7daa42bbc86..f0afa291fd58 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -885,7 +885,7 @@ static const struct inode_operations nfs3_dir_inode_operations = { .getattr = nfs_getattr, .setattr = nfs_setattr, #ifdef CONFIG_NFS_V3_ACL - .listxattr = generic_listxattr, + .listxattr = nfs3_listxattr, .getxattr = generic_getxattr, .setxattr = generic_setxattr, .removexattr = generic_removexattr, @@ -899,7 +899,7 @@ static const struct inode_operations nfs3_file_inode_operations = { .getattr = nfs_getattr, .setattr = nfs_setattr, #ifdef CONFIG_NFS_V3_ACL - .listxattr = generic_listxattr, + .listxattr = nfs3_listxattr, .getxattr = generic_getxattr, .setxattr = generic_setxattr, .removexattr = generic_removexattr, From b6e195fd4f663a7a97183feddb4ec27f0a5a00ec Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Thu, 3 Jul 2014 23:13:19 +0200 Subject: [PATCH 126/455] MAINTAINERS: change IEEE 802.15.4 maintainer This patch changes the IEEE 802.15.4 subsystem maintainer to Alexander Aring. We discussed this change before via e-mail and I collected the acks from the current maintainers. Signed-off-by: Alexander Aring Acked-by: Dmitry Eremin-Solenikov Acked-by: Alexander Smirnov Signed-off-by: David S. Miller --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index ebc812827c63..23863ce06996 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4476,8 +4476,7 @@ S: Supported F: drivers/idle/i7300_idle.c IEEE 802.15.4 SUBSYSTEM -M: Alexander Smirnov -M: Dmitry Eremin-Solenikov +M: Alexander Aring L: linux-zigbee-devel@lists.sourceforge.net (moderated for non-subscribers) W: http://apps.sourceforge.net/trac/linux-zigbee T: git git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan.git From a97e8027b1d28eafe6bafe062556c1ec926a49c6 Mon Sep 17 00:00:00 2001 From: Matthias Brugger Date: Thu, 3 Jul 2014 13:58:52 +0200 Subject: [PATCH 127/455] irqchip: gic: Add support for cortex a7 compatible string Patch 0a68214b "ARM: DT: Add binding for GIC virtualization extentions (VGIC)" added the "arm,cortex-a7-gic" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. To let real Cortex-A7 SoCs use it, add the necessary declaration to the device driver. Signed-off-by: Matthias Brugger Link: https://lkml.kernel.org/r/1404388732-28890-1-git-send-email-matthias.bgg@gmail.com Fixes: 0a68214b76ca ("ARM: DT: Add binding for GIC virtualization extentions (VGIC)") Cc: # v3.5+ Signed-off-by: Jason Cooper --- drivers/irqchip/irq-gic.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 7e11c9d6ae8c..aadd6f5723cb 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1073,6 +1073,7 @@ gic_of_init(struct device_node *node, struct device_node *parent) } IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init); IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init); +IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init); IRQCHIP_DECLARE(msm_8660_qgic, "qcom,msm-8660-qgic", gic_of_init); IRQCHIP_DECLARE(msm_qgic2, "qcom,msm-qgic2", gic_of_init); From 29322d0db98e5a84f5cc6a55655bee3dc4ffb5ab Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Sat, 5 Jul 2014 13:44:13 -0400 Subject: [PATCH 128/455] tipc: fix bug in multicast/broadcast message reassembly Since commit 37e22164a8a3c39bdad45aa463b1e69a1fdf4110 ("tipc: rename and move message reassembly function") reassembly of long broadcast messages has been broken. This is because we test for a non-NULL return value of the *buf parameter as criteria for succesful reassembly. However, this parameter is left defined even after reception of the first fragment, when reassebly is still incomplete. This leads to a kernel crash as soon as a the first fragment of a long broadcast message is received. We fix this with this commit, by implementing a stricter behavior of the function and its return values. This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy Acked-by: Ying Xue Signed-off-by: David S. Miller --- net/tipc/msg.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 8be6e94a1ca9..0a37a472c29f 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -101,9 +101,11 @@ int tipc_msg_build(struct tipc_msg *hdr, struct iovec const *msg_sect, } /* tipc_buf_append(): Append a buffer to the fragment list of another buffer - * Let first buffer become head buffer - * Returns 1 and sets *buf to headbuf if chain is complete, otherwise 0 - * Leaves headbuf pointer at NULL if failure + * @*headbuf: in: NULL for first frag, otherwise value returned from prev call + * out: set when successful non-complete reassembly, otherwise NULL + * @*buf: in: the buffer to append. Always defined + * out: head buf after sucessful complete reassembly, otherwise NULL + * Returns 1 when reassembly complete, otherwise 0 */ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf) { @@ -122,6 +124,7 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf) goto out_free; head = *headbuf = frag; skb_frag_list_init(head); + *buf = NULL; return 0; } if (!head) @@ -150,5 +153,7 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf) out_free: pr_warn_ratelimited("Unable to build fragment list\n"); kfree_skb(*buf); + kfree_skb(*headbuf); + *buf = *headbuf = NULL; return 0; } From 85b722d760f0de77c4bb371b77202784671f5a54 Mon Sep 17 00:00:00 2001 From: Rickard Strandqvist Date: Sun, 6 Jul 2014 14:04:37 +0200 Subject: [PATCH 129/455] isdn: hisax: l3ni1.c: Fix for possible null pointer dereference There is otherwise a risk of a possible null pointer dereference. Was largely found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist --- drivers/isdn/hisax/l3ni1.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/isdn/hisax/l3ni1.c b/drivers/isdn/hisax/l3ni1.c index 0df6691d045c..8dc791bfaa6f 100644 --- a/drivers/isdn/hisax/l3ni1.c +++ b/drivers/isdn/hisax/l3ni1.c @@ -2059,13 +2059,17 @@ static int l3ni1_cmd_global(struct PStack *st, isdn_ctrl *ic) memcpy(p, ic->parm.ni1_io.data, ic->parm.ni1_io.datalen); /* copy data */ l = (p - temp) + ic->parm.ni1_io.datalen; /* total length */ - if (ic->parm.ni1_io.timeout > 0) - if (!(pc = ni1_new_l3_process(st, -1))) - { free_invoke_id(st, id); + if (ic->parm.ni1_io.timeout > 0) { + pc = ni1_new_l3_process(st, -1); + if (!pc) { + free_invoke_id(st, id); return (-2); } - pc->prot.ni1.ll_id = ic->parm.ni1_io.ll_id; /* remember id */ - pc->prot.ni1.proc = ic->parm.ni1_io.proc; /* and procedure */ + /* remember id */ + pc->prot.ni1.ll_id = ic->parm.ni1_io.ll_id; + /* and procedure */ + pc->prot.ni1.proc = ic->parm.ni1_io.proc; + } if (!(skb = l3_alloc_skb(l))) { free_invoke_id(st, id); From 233b43010330ed8cf39cf636880017df3e33f102 Mon Sep 17 00:00:00 2001 From: Hariprasad S Date: Mon, 23 Jun 2014 19:12:35 +0530 Subject: [PATCH 130/455] RDMA/cxgb4: Fix skb_leak in reject_cr() Based on origninal work by Steve Wise Signed-off-by: Steve Wise Signed-off-by: Hariprasad Shenai Signed-off-by: Roland Dreier --- drivers/infiniband/hw/cxgb4/cm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index 5e153f6d4b48..cc36e9bf9dea 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -2180,7 +2180,6 @@ static void reject_cr(struct c4iw_dev *dev, u32 hwtid, struct sk_buff *skb) PDBG("%s c4iw_dev %p tid %u\n", __func__, dev, hwtid); BUG_ON(skb_cloned(skb)); skb_trim(skb, sizeof(struct cpl_tid_release)); - skb_get(skb); release_tid(&dev->rdev, hwtid, skb); return; } From 5dab6d3ab1abed99be6166b844af58237d52a135 Mon Sep 17 00:00:00 2001 From: Hariprasad S Date: Mon, 23 Jun 2014 19:12:36 +0530 Subject: [PATCH 131/455] RDMA/cxgb4: Clean up connection on ARP error Based on origninal work by Steve Wise Signed-off-by: Steve Wise Signed-off-by: Hariprasad Shenai Signed-off-by: Roland Dreier --- drivers/infiniband/hw/cxgb4/cm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index cc36e9bf9dea..6a93280250d1 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -432,8 +432,17 @@ static void arp_failure_discard(void *handle, struct sk_buff *skb) */ static void act_open_req_arp_failure(void *handle, struct sk_buff *skb) { + struct c4iw_ep *ep = handle; + printk(KERN_ERR MOD "ARP failure duing connect\n"); kfree_skb(skb); + connect_reply_upcall(ep, -EHOSTUNREACH); + state_set(&ep->com, DEAD); + remove_handle(ep->com.dev, &ep->com.dev->atid_idr, ep->atid); + cxgb4_free_atid(ep->com.dev->rdev.lldi.tids, ep->atid); + dst_release(ep->dst); + cxgb4_l2t_release(ep->l2t); + c4iw_put_ep(&ep->com); } /* @@ -658,7 +667,7 @@ static int send_connect(struct c4iw_ep *ep) opt2 |= T5_OPT_2_VALID; opt2 |= V_CONG_CNTRL(CONG_ALG_TAHOE); } - t4_set_arp_err_handler(skb, NULL, act_open_req_arp_failure); + t4_set_arp_err_handler(skb, ep, act_open_req_arp_failure); if (is_t4(ep->com.dev->rdev.lldi.adapter_type)) { if (ep->com.remote_addr.ss_family == AF_INET) { From 6b54d54dea82ae214e4a45a503c4ef755a8ecee8 Mon Sep 17 00:00:00 2001 From: Steve Wise Date: Tue, 8 Jul 2014 10:20:35 -0500 Subject: [PATCH 132/455] RDMA/cxgb4: Initialize the device status page The status page is mapped to user processes and allows sharing the device state between the kernel and user processes. This state isn't getting initialized and thus intermittently causes problems. Namely, the user process can mistakenly think the user doorbell writes are disabled which causes SQ work requests to never get fetched by HW. Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes"). Signed-off-by: Steve Wise Cc: # v3.15 Signed-off-by: Roland Dreier --- drivers/infiniband/hw/cxgb4/device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c index dd93aadc996e..16b75de4b2de 100644 --- a/drivers/infiniband/hw/cxgb4/device.c +++ b/drivers/infiniband/hw/cxgb4/device.c @@ -696,6 +696,7 @@ static int c4iw_rdev_open(struct c4iw_rdev *rdev) pr_err(MOD "error allocating status page\n"); goto err4; } + rdev->status_page->db_off = 0; return 0; err4: c4iw_rqtpool_destroy(rdev); From e0056593b61253f1a8a9941dacda22e73b963cdc Mon Sep 17 00:00:00 2001 From: Dmitry Popov Date: Sat, 5 Jul 2014 02:26:37 +0400 Subject: [PATCH 133/455] ip_tunnel: fix ip_tunnel_lookup This patch fixes 3 similar bugs where incoming packets might be routed into wrong non-wildcard tunnels: 1) Consider the following setup: ip address add 1.1.1.1/24 dev eth0 ip address add 1.1.1.2/24 dev eth0 ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst = 1.1.1.2. Moreover even if there was wildcard tunnel like ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0 but it was created before explicit one (with local 1.1.1.1), incoming ipip packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1. Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti) 2) ip address add 1.1.1.1/24 dev eth0 ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this issue, 2.2.146.85 is just an example, there are more than 4 million of them. And again, wildcard tunnel like ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. Gre & vti tunnels had the same issue. 3) ip address add 1.1.1.1/24 dev eth0 ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0 ip link set gre1 up Any incoming gre packet with key = 1 were routed into gre1, no matter what src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised the issue, 2.2.146.84 is just an example, there are more than 4 million of them. Wildcard tunnel like ip tunnel add gre2 remote any local any key 1 mode gre dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. All this stuff happened because while looking for a wildcard tunnel we didn't check that matched tunnel is a wildcard one. Fixed. Signed-off-by: Dmitry Popov Signed-off-by: David S. Miller --- net/ipv4/ip_tunnel.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 54b6731dab55..6f9de61dce5f 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -169,6 +169,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn, hlist_for_each_entry_rcu(t, head, hash_node) { if (remote != t->parms.iph.daddr || + t->parms.iph.saddr != 0 || !(t->dev->flags & IFF_UP)) continue; @@ -185,10 +186,11 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn, head = &itn->tunnels[hash]; hlist_for_each_entry_rcu(t, head, hash_node) { - if ((local != t->parms.iph.saddr && - (local != t->parms.iph.daddr || - !ipv4_is_multicast(local))) || - !(t->dev->flags & IFF_UP)) + if ((local != t->parms.iph.saddr || t->parms.iph.daddr != 0) && + (local != t->parms.iph.daddr || !ipv4_is_multicast(local))) + continue; + + if (!(t->dev->flags & IFF_UP)) continue; if (!ip_tunnel_key_match(&t->parms, flags, key)) @@ -205,6 +207,8 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn, hlist_for_each_entry_rcu(t, head, hash_node) { if (t->parms.i_key != key || + t->parms.iph.saddr != 0 || + t->parms.iph.daddr != 0 || !(t->dev->flags & IFF_UP)) continue; From 36beddc272c111689f3042bf3d10a64d8a805f93 Mon Sep 17 00:00:00 2001 From: Andrey Utkin Date: Mon, 7 Jul 2014 23:22:50 +0300 Subject: [PATCH 134/455] appletalk: Fix socket referencing in skb Setting just skb->sk without taking its reference and setting a destructor is invalid. However, in the places where this was done, skb is used in a way not requiring skb->sk setting. So dropping the setting of skb->sk. Thanks to Eric Dumazet for correct solution. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 Reported-by: Ed Martin Signed-off-by: Andrey Utkin Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/appletalk/ddp.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c index 01a1082e02b3..bfcf6be1d665 100644 --- a/net/appletalk/ddp.c +++ b/net/appletalk/ddp.c @@ -1489,8 +1489,6 @@ static int atalk_rcv(struct sk_buff *skb, struct net_device *dev, goto drop; /* Queue packet (standard) */ - skb->sk = sock; - if (sock_queue_rcv_skb(sock, skb) < 0) goto drop; @@ -1644,7 +1642,6 @@ static int atalk_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr if (!skb) goto out; - skb->sk = sk; skb_reserve(skb, ddp_dl->header_length); skb_reserve(skb, dev->hard_header_len); skb->dev = dev; From 0b0302449110ca5ca4350458ed57b971fcb78ec1 Mon Sep 17 00:00:00 2001 From: hayeswang Date: Tue, 8 Jul 2014 14:49:28 +0800 Subject: [PATCH 135/455] r8152: wake up the device before dumping the hw counter The device should be waked up from runtime suspend before dumping the hw counter. Signed-off-by: Hayes Wang Signed-off-by: David S. Miller --- drivers/net/usb/r8152.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 25431965a625..a795ecf8d5b6 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3204,8 +3204,13 @@ static void rtl8152_get_ethtool_stats(struct net_device *dev, struct r8152 *tp = netdev_priv(dev); struct tally_counter tally; + if (usb_autopm_get_interface(tp->intf) < 0) + return; + generic_ocp_read(tp, PLA_TALLYCNT, sizeof(tally), &tally, MCU_TYPE_PLA); + usb_autopm_put_interface(tp->intf); + data[0] = le64_to_cpu(tally.tx_packets); data[1] = le64_to_cpu(tally.rx_packets); data[2] = le64_to_cpu(tally.tx_errors); From fbc6daf19745b372c0d909e5d74ab02e42b70e51 Mon Sep 17 00:00:00 2001 From: Amir Vadai Date: Tue, 8 Jul 2014 11:28:12 +0300 Subject: [PATCH 136/455] net/mlx4_en: Ignore budget on TX napi polling It is recommended that TX work not count against the quota. The cost of TX packet liberation is a minute percentage of what it costs to process an RX frame. Furthermore, that SKB freeing makes memory available for other paths in the stack. Give the TX a larger budget and be more aggressive about cleaning up the Tx descriptors this budget could be changed using ethtool: $ ethtool -C eth1 tx-frames-irq Signed-off-by: Amir Vadai Signed-off-by: David S. Miller --- .../net/ethernet/mellanox/mlx4/en_ethtool.c | 7 +++++ .../net/ethernet/mellanox/mlx4/en_netdev.c | 1 + drivers/net/ethernet/mellanox/mlx4/en_tx.c | 28 +++++++++---------- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++ 4 files changed, 24 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c index fa1a069e14e6..68d763d2d030 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -417,6 +417,8 @@ static int mlx4_en_get_coalesce(struct net_device *dev, coal->tx_coalesce_usecs = priv->tx_usecs; coal->tx_max_coalesced_frames = priv->tx_frames; + coal->tx_max_coalesced_frames_irq = priv->tx_work_limit; + coal->rx_coalesce_usecs = priv->rx_usecs; coal->rx_max_coalesced_frames = priv->rx_frames; @@ -426,6 +428,7 @@ static int mlx4_en_get_coalesce(struct net_device *dev, coal->rx_coalesce_usecs_high = priv->rx_usecs_high; coal->rate_sample_interval = priv->sample_interval; coal->use_adaptive_rx_coalesce = priv->adaptive_rx_coal; + return 0; } @@ -434,6 +437,9 @@ static int mlx4_en_set_coalesce(struct net_device *dev, { struct mlx4_en_priv *priv = netdev_priv(dev); + if (!coal->tx_max_coalesced_frames_irq) + return -EINVAL; + priv->rx_frames = (coal->rx_max_coalesced_frames == MLX4_EN_AUTO_CONF) ? MLX4_EN_RX_COAL_TARGET : @@ -457,6 +463,7 @@ static int mlx4_en_set_coalesce(struct net_device *dev, priv->rx_usecs_high = coal->rx_coalesce_usecs_high; priv->sample_interval = coal->rate_sample_interval; priv->adaptive_rx_coal = coal->use_adaptive_rx_coalesce; + priv->tx_work_limit = coal->tx_max_coalesced_frames_irq; return mlx4_en_moderation_update(priv); } diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index e94d96f0ef55..7345c43b019e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2473,6 +2473,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, MLX4_WQE_CTRL_SOLICITED); priv->num_tx_rings_p_up = mdev->profile.num_tx_rings_p_up; priv->tx_ring_num = prof->tx_ring_num; + priv->tx_work_limit = MLX4_EN_DEFAULT_TX_WORK; priv->tx_ring = kzalloc(sizeof(struct mlx4_en_tx_ring *) * MAX_TX_RINGS, GFP_KERNEL); diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index ac3dead3792c..5045bab59633 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -351,9 +351,8 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring) return cnt; } -static int mlx4_en_process_tx_cq(struct net_device *dev, - struct mlx4_en_cq *cq, - int budget) +static bool mlx4_en_process_tx_cq(struct net_device *dev, + struct mlx4_en_cq *cq) { struct mlx4_en_priv *priv = netdev_priv(dev); struct mlx4_cq *mcq = &cq->mcq; @@ -372,9 +371,10 @@ static int mlx4_en_process_tx_cq(struct net_device *dev, int factor = priv->cqe_factor; u64 timestamp = 0; int done = 0; + int budget = priv->tx_work_limit; if (!priv->port_up) - return 0; + return true; index = cons_index & size_mask; cqe = &buf[(index << factor) + factor]; @@ -447,7 +447,7 @@ static int mlx4_en_process_tx_cq(struct net_device *dev, netif_tx_wake_queue(ring->tx_queue); ring->wake_queue++; } - return done; + return done < budget; } void mlx4_en_tx_irq(struct mlx4_cq *mcq) @@ -467,18 +467,16 @@ int mlx4_en_poll_tx_cq(struct napi_struct *napi, int budget) struct mlx4_en_cq *cq = container_of(napi, struct mlx4_en_cq, napi); struct net_device *dev = cq->dev; struct mlx4_en_priv *priv = netdev_priv(dev); - int done; + int clean_complete; - done = mlx4_en_process_tx_cq(dev, cq, budget); + clean_complete = mlx4_en_process_tx_cq(dev, cq); + if (!clean_complete) + return budget; - /* If we used up all the quota - we're probably not done yet... */ - if (done < budget) { - /* Done for now */ - napi_complete(napi); - mlx4_en_arm_cq(priv, cq); - return done; - } - return budget; + napi_complete(napi); + mlx4_en_arm_cq(priv, cq); + + return 0; } static struct mlx4_en_tx_desc *mlx4_en_bounce_to_desc(struct mlx4_en_priv *priv, diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 624e1939e9ee..d72a5a894fc6 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -126,6 +126,8 @@ enum { #define MAX_TX_RINGS (MLX4_EN_MAX_TX_RING_P_UP * \ MLX4_EN_NUM_UP) +#define MLX4_EN_DEFAULT_TX_WORK 256 + /* Target number of packets to coalesce with interrupt moderation */ #define MLX4_EN_RX_COAL_TARGET 44 #define MLX4_EN_RX_COAL_TIME 0x10 @@ -543,6 +545,7 @@ struct mlx4_en_priv { __be32 ctrl_flags; u32 flags; u8 num_tx_rings_p_up; + u32 tx_work_limit; u32 tx_ring_num; u32 rx_ring_num; u32 rx_skb_size; From 4d12bc63ab5e48c1d78fa13883cf6fefcea3afb1 Mon Sep 17 00:00:00 2001 From: Thomas Petazzoni Date: Tue, 8 Jul 2014 10:49:43 +0200 Subject: [PATCH 137/455] net: mvneta: fix operation in 10 Mbit/s mode As reported by Maggie Mae Roxas, the mvneta driver doesn't behave properly in 10 Mbit/s mode. This is due to a misconfiguration of the MVNETA_GMAC_AUTONEG_CONFIG register: bit MVNETA_GMAC_CONFIG_MII_SPEED must be set for a 100 Mbit/s speed, but cleared for a 10 Mbit/s speed, which the driver was not properly doing. This commit adjusts that by setting the MVNETA_GMAC_CONFIG_MII_SPEED bit only in 100 Mbit/s mode, and relying on the fact that all the speed related bits of this register are cleared at the beginning of the mvneta_adjust_link() function. This problem exists since c5aff18204da0 ("net: mvneta: driver for Marvell Armada 370/XP network unit") which is the commit that introduced the mvneta driver in the kernel. Cc: # v3.8+ Fixes: c5aff18204da0 ("net: mvneta: driver for Marvell Armada 370/XP network unit") Reported-by: Maggie Mae Roxas Cc: Maggie Mae Roxas Signed-off-by: Thomas Petazzoni Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 45beca17fa50..d49f08d22071 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -2529,7 +2529,7 @@ static void mvneta_adjust_link(struct net_device *ndev) if (phydev->speed == SPEED_1000) val |= MVNETA_GMAC_CONFIG_GMII_SPEED; - else + else if (phydev->speed == SPEED_100) val |= MVNETA_GMAC_CONFIG_MII_SPEED; mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val); From 0a1985879437d14bda8c90d0dae3455c467d7642 Mon Sep 17 00:00:00 2001 From: Thomas Fitzsimmons Date: Tue, 8 Jul 2014 19:44:07 -0400 Subject: [PATCH 138/455] net: mvneta: Fix big endian issue in mvneta_txq_desc_csum() This commit fixes the command value generated for CSUM calculation when running in big endian mode. The Ethernet protocol ID for IP was being unconditionally byte-swapped in the layer 3 protocol check (with swab16), which caused the mvneta driver to not function correctly in big endian mode. This patch byte-swaps the ID conditionally with htons. Cc: # v3.13+ Signed-off-by: Thomas Fitzsimmons Signed-off-by: David S. Miller --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index d49f08d22071..dadd9a5f6323 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1207,7 +1207,7 @@ static u32 mvneta_txq_desc_csum(int l3_offs, int l3_proto, command = l3_offs << MVNETA_TX_L3_OFF_SHIFT; command |= ip_hdr_len << MVNETA_TX_IP_HLEN_SHIFT; - if (l3_proto == swab16(ETH_P_IP)) + if (l3_proto == htons(ETH_P_IP)) command |= MVNETA_TXD_IP_CSUM; else command |= MVNETA_TX_L3_IP6; From ac30ef832e6af0505b6f0251a6659adcfa74975e Mon Sep 17 00:00:00 2001 From: Ben Pfaff Date: Wed, 9 Jul 2014 10:31:22 -0700 Subject: [PATCH 139/455] netlink: Fix handling of error from netlink_dump(). netlink_dump() returns a negative errno value on error. Until now, netlink_recvmsg() directly recorded that negative value in sk->sk_err, but that's wrong since sk_err takes positive errno values. (This manifests as userspace receiving a positive return value from the recv() system call, falsely indicating success.) This bug was introduced in the commit that started checking the netlink_dump() return value, commit b44d211 (netlink: handle errors from netlink_dump()). Multithreaded Netlink dumps are one way to trigger this behavior in practice, as described in the commit message for the userspace workaround posted here: http://openvswitch.org/pipermail/dev/2014-June/042339.html This commit also fixes the same bug in netlink_poll(), introduced in commit cd1df525d (netlink: add flow control for memory mapped I/O). Signed-off-by: Ben Pfaff Signed-off-by: David S. Miller --- net/netlink/af_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 15c731f03fa6..e6fac7e3db52 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -636,7 +636,7 @@ static unsigned int netlink_poll(struct file *file, struct socket *sock, while (nlk->cb_running && netlink_dump_space(nlk)) { err = netlink_dump(sk); if (err < 0) { - sk->sk_err = err; + sk->sk_err = -err; sk->sk_error_report(sk); break; } @@ -2483,7 +2483,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock, atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { ret = netlink_dump(sk); if (ret) { - sk->sk_err = ret; + sk->sk_err = -ret; sk->sk_error_report(sk); } } From b51ecea852b712618796d9eab8428a7d5f1f106f Mon Sep 17 00:00:00 2001 From: hayeswang Date: Wed, 9 Jul 2014 14:52:51 +0800 Subject: [PATCH 140/455] r8169: disable L23 For RTL8411, RTL8111G, RTL8402, RTL8105, and RTL8106, disable the feature of entering the L2/L3 link state of the PCIe. When the nic starts the process of entering the L2/L3 link state and the PCI reset occurs before the work is finished, the work would be queued and continue after the next the PCI reset occurs. This causes the device stays in L2/L3 link state, and the system couldn't find the device. Signed-off-by: Hayes Wang Acked-by: Francois Romieu Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/r8169.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index be425ad5e824..06bdc31a828d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -538,6 +538,7 @@ enum rtl_register_content { MagicPacket = (1 << 5), /* Wake up when receives a Magic Packet */ LinkUp = (1 << 4), /* Wake up when the cable connection is re-established */ Jumbo_En0 = (1 << 2), /* 8168 only. Reserved in the 8168b */ + Rdy_to_L23 = (1 << 1), /* L23 Enable */ Beacon_en = (1 << 0), /* 8168 only. Reserved in the 8168b */ /* Config4 register */ @@ -4897,6 +4898,21 @@ static void rtl_enable_clock_request(struct pci_dev *pdev) PCI_EXP_LNKCTL_CLKREQ_EN); } +static void rtl_pcie_state_l2l3_enable(struct rtl8169_private *tp, bool enable) +{ + void __iomem *ioaddr = tp->mmio_addr; + u8 data; + + data = RTL_R8(Config3); + + if (enable) + data |= Rdy_to_L23; + else + data &= ~Rdy_to_L23; + + RTL_W8(Config3, data); +} + #define R8168_CPCMD_QUIRK_MASK (\ EnableBist | \ Mac_dbgo_oe | \ @@ -5246,6 +5262,7 @@ static void rtl_hw_start_8411(struct rtl8169_private *tp) }; rtl_hw_start_8168f(tp); + rtl_pcie_state_l2l3_enable(tp, false); rtl_ephy_init(tp, e_info_8168f_1, ARRAY_SIZE(e_info_8168f_1)); @@ -5284,6 +5301,8 @@ static void rtl_hw_start_8168g_1(struct rtl8169_private *tp) rtl_w1w0_eri(tp, 0x2fc, ERIAR_MASK_0001, 0x01, 0x06, ERIAR_EXGMAC); rtl_w1w0_eri(tp, 0x1b0, ERIAR_MASK_0011, 0x0000, 0x1000, ERIAR_EXGMAC); + + rtl_pcie_state_l2l3_enable(tp, false); } static void rtl_hw_start_8168g_2(struct rtl8169_private *tp) @@ -5536,6 +5555,8 @@ static void rtl_hw_start_8105e_1(struct rtl8169_private *tp) RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN); rtl_ephy_init(tp, e_info_8105e_1, ARRAY_SIZE(e_info_8105e_1)); + + rtl_pcie_state_l2l3_enable(tp, false); } static void rtl_hw_start_8105e_2(struct rtl8169_private *tp) @@ -5571,6 +5592,8 @@ static void rtl_hw_start_8402(struct rtl8169_private *tp) rtl_eri_write(tp, 0xc0, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC); rtl_eri_write(tp, 0xb8, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC); rtl_w1w0_eri(tp, 0x0d4, ERIAR_MASK_0011, 0x0e00, 0xff00, ERIAR_EXGMAC); + + rtl_pcie_state_l2l3_enable(tp, false); } static void rtl_hw_start_8106(struct rtl8169_private *tp) @@ -5583,6 +5606,8 @@ static void rtl_hw_start_8106(struct rtl8169_private *tp) RTL_W32(MISC, (RTL_R32(MISC) | DISABLE_LAN_EN) & ~EARLY_TALLY_EN); RTL_W8(MCU, RTL_R8(MCU) | EN_NDP | EN_OOB_RESET); RTL_W8(DLLPR, RTL_R8(DLLPR) & ~PFM_EN); + + rtl_pcie_state_l2l3_enable(tp, false); } static void rtl_hw_start_8101(struct net_device *dev) From 6ef07a9f369742a7b18c77484411cff0bd790291 Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Sun, 8 Jun 2014 10:00:59 +0300 Subject: [PATCH 141/455] mlx5_core: Fix possible race between mr tree insert/delete In mlx5_core_destroy_mkey(), we must first remove the mr from the radix tree and then destroy it. Otherwise we might hit a race if the key was reallocated and we attempted to insert it to the radix tree. Also handle radix tree insert/delete failures. Signed-off-by: Sagi Grimberg Reviewed-by: Eli Cohen Signed-off-by: Roland Dreier --- drivers/net/ethernet/mellanox/mlx5/core/mr.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index ba0401d4af50..184c3615f479 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -94,6 +94,11 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, write_lock_irq(&table->lock); err = radix_tree_insert(&table->tree, mlx5_base_mkey(mr->key), mr); write_unlock_irq(&table->lock); + if (err) { + mlx5_core_warn(dev, "failed radix tree insert of mr 0x%x, %d\n", + mlx5_base_mkey(mr->key), err); + mlx5_core_destroy_mkey(dev, mr); + } return err; } @@ -104,12 +109,22 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) struct mlx5_mr_table *table = &dev->priv.mr_table; struct mlx5_destroy_mkey_mbox_in in; struct mlx5_destroy_mkey_mbox_out out; + struct mlx5_core_mr *deleted_mr; unsigned long flags; int err; memset(&in, 0, sizeof(in)); memset(&out, 0, sizeof(out)); + write_lock_irqsave(&table->lock, flags); + deleted_mr = radix_tree_delete(&table->tree, mlx5_base_mkey(mr->key)); + write_unlock_irqrestore(&table->lock, flags); + if (!deleted_mr) { + mlx5_core_warn(dev, "failed radix tree delete of mr 0x%x\n", + mlx5_base_mkey(mr->key)); + return -ENOENT; + } + in.hdr.opcode = cpu_to_be16(MLX5_CMD_OP_DESTROY_MKEY); in.mkey = cpu_to_be32(mlx5_mkey_to_idx(mr->key)); err = mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out)); @@ -119,10 +134,6 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) if (out.hdr.status) return mlx5_cmd_status_to_err(&out.hdr); - write_lock_irqsave(&table->lock, flags); - radix_tree_delete(&table->tree, mlx5_base_mkey(mr->key)); - write_unlock_irqrestore(&table->lock, flags); - return err; } EXPORT_SYMBOL(mlx5_core_destroy_mkey); From a12f78c582b5a7a549fbee1a58d5778e651764ad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stefan=20S=C3=B8rensen?= Date: Wed, 25 Jun 2014 14:40:16 +0200 Subject: [PATCH 142/455] dp83640: Always decode received status frames MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently status frames are only handled when packet timestamping is enabled, but status frames are also needed for pin event timestamping. Fix by moving packet timestamping check to after status frame decode. Signed-off-by: Stefan Sørensen Acked-by: Richard Cochran Signed-off-by: David S. Miller --- drivers/net/phy/dp83640.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c index 6a999e6814a0..9408157a246c 100644 --- a/drivers/net/phy/dp83640.c +++ b/drivers/net/phy/dp83640.c @@ -1323,15 +1323,15 @@ static bool dp83640_rxtstamp(struct phy_device *phydev, { struct dp83640_private *dp83640 = phydev->priv; - if (!dp83640->hwts_rx_en) - return false; - if (is_status_frame(skb, type)) { decode_status_frame(dp83640, skb); kfree_skb(skb); return true; } + if (!dp83640->hwts_rx_en) + return false; + SKB_PTP_TYPE(skb) = type; skb_queue_tail(&dp83640->rx_queue, skb); schedule_work(&dp83640->ts_work); From b4df480f68ae03b5dd4ab0db56536fbcec741705 Mon Sep 17 00:00:00 2001 From: Joonyoung Shim Date: Thu, 10 Jul 2014 11:49:42 +0900 Subject: [PATCH 143/455] usbnet: smsc95xx: add reset_resume function with reset operation The smsc95xx needs to resume with reset operation. Otherwise it causes system hang by network error like below after resume. This case appears on odroid u3 board. [ 9.727600] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped [ 9.727648] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped [ 9.727689] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped [ 9.727728] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped [ 9.729486] PM: resume of devices complete after 2011.219 msecs [ 10.117609] Restarting tasks ... done. [ 11.725099] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped [ 13.480846] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped [ 13.481361] smsc95xx 1-2:1.0 eth0: kevent 2 may have been dropped ... Signed-off-by: Joonyoung Shim Signed-off-by: David S. Miller --- drivers/net/usb/smsc95xx.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index 424db65e4396..d07bf4cb893f 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -1714,6 +1714,18 @@ static int smsc95xx_resume(struct usb_interface *intf) return ret; } +static int smsc95xx_reset_resume(struct usb_interface *intf) +{ + struct usbnet *dev = usb_get_intfdata(intf); + int ret; + + ret = smsc95xx_reset(dev); + if (ret < 0) + return ret; + + return smsc95xx_resume(intf); +} + static void smsc95xx_rx_csum_offload(struct sk_buff *skb) { skb->csum = *(u16 *)(skb_tail_pointer(skb) - 2); @@ -2004,7 +2016,7 @@ static struct usb_driver smsc95xx_driver = { .probe = usbnet_probe, .suspend = smsc95xx_suspend, .resume = smsc95xx_resume, - .reset_resume = smsc95xx_resume, + .reset_resume = smsc95xx_reset_resume, .disconnect = usbnet_disconnect, .disable_hub_initiated_lpm = 1, .supports_autosuspend = 1, From 948264879b6894dc389a44b99fae4f0b72932619 Mon Sep 17 00:00:00 2001 From: Todd Fujinaka Date: Thu, 10 Jul 2014 01:47:15 -0700 Subject: [PATCH 144/455] igb: Workaround for i210 Errata 25: Slow System Clock On some devices, the internal PLL circuit occasionally provides the wrong clock frequency after power up. The probability of failure is less than one failure per 1000 power cycles. When the failure occurs, the internal clock frequency is around 1/20 of the correct frequency. Cc: stable Signed-off-by: Todd Fujinaka Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/igb/e1000_82575.c | 7 ++ .../net/ethernet/intel/igb/e1000_defines.h | 18 ++--- drivers/net/ethernet/intel/igb/e1000_hw.h | 3 + drivers/net/ethernet/intel/igb/e1000_i210.c | 66 +++++++++++++++++++ drivers/net/ethernet/intel/igb/e1000_i210.h | 12 ++++ drivers/net/ethernet/intel/igb/e1000_regs.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 14 ++++ 7 files changed, 113 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c index a2db388cc31e..ee74f9536b31 100644 --- a/drivers/net/ethernet/intel/igb/e1000_82575.c +++ b/drivers/net/ethernet/intel/igb/e1000_82575.c @@ -1481,6 +1481,13 @@ static s32 igb_init_hw_82575(struct e1000_hw *hw) s32 ret_val; u16 i, rar_count = mac->rar_entry_count; + if ((hw->mac.type >= e1000_i210) && + !(igb_get_flash_presence_i210(hw))) { + ret_val = igb_pll_workaround_i210(hw); + if (ret_val) + return ret_val; + } + /* Initialize identification LED */ ret_val = igb_id_led_init(hw); if (ret_val) { diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 2a8bb35c2df2..217f8138851b 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -46,14 +46,15 @@ #define E1000_CTRL_EXT_SDP3_DIR 0x00000800 /* SDP3 Data direction */ /* Physical Func Reset Done Indication */ -#define E1000_CTRL_EXT_PFRSTD 0x00004000 -#define E1000_CTRL_EXT_LINK_MODE_MASK 0x00C00000 -#define E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES 0x00C00000 -#define E1000_CTRL_EXT_LINK_MODE_1000BASE_KX 0x00400000 -#define E1000_CTRL_EXT_LINK_MODE_SGMII 0x00800000 -#define E1000_CTRL_EXT_LINK_MODE_GMII 0x00000000 -#define E1000_CTRL_EXT_EIAME 0x01000000 -#define E1000_CTRL_EXT_IRCA 0x00000001 +#define E1000_CTRL_EXT_PFRSTD 0x00004000 +#define E1000_CTRL_EXT_SDLPE 0X00040000 /* SerDes Low Power Enable */ +#define E1000_CTRL_EXT_LINK_MODE_MASK 0x00C00000 +#define E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES 0x00C00000 +#define E1000_CTRL_EXT_LINK_MODE_1000BASE_KX 0x00400000 +#define E1000_CTRL_EXT_LINK_MODE_SGMII 0x00800000 +#define E1000_CTRL_EXT_LINK_MODE_GMII 0x00000000 +#define E1000_CTRL_EXT_EIAME 0x01000000 +#define E1000_CTRL_EXT_IRCA 0x00000001 /* Interrupt delay cancellation */ /* Driver loaded bit for FW */ #define E1000_CTRL_EXT_DRV_LOAD 0x10000000 @@ -62,6 +63,7 @@ /* packet buffer parity error detection enabled */ /* descriptor FIFO parity error detection enable */ #define E1000_CTRL_EXT_PBA_CLR 0x80000000 /* PBA Clear */ +#define E1000_CTRL_EXT_PHYPDEN 0x00100000 #define E1000_I2CCMD_REG_ADDR_SHIFT 16 #define E1000_I2CCMD_PHY_ADDR_SHIFT 24 #define E1000_I2CCMD_OPCODE_READ 0x08000000 diff --git a/drivers/net/ethernet/intel/igb/e1000_hw.h b/drivers/net/ethernet/intel/igb/e1000_hw.h index 89925e405849..ce55ea5d750c 100644 --- a/drivers/net/ethernet/intel/igb/e1000_hw.h +++ b/drivers/net/ethernet/intel/igb/e1000_hw.h @@ -567,4 +567,7 @@ struct net_device *igb_get_hw_dev(struct e1000_hw *hw); /* These functions must be implemented by drivers */ s32 igb_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value); s32 igb_write_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value); + +void igb_read_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value); +void igb_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value); #endif /* _E1000_HW_H_ */ diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.c b/drivers/net/ethernet/intel/igb/e1000_i210.c index 337161f440dd..65d931669f81 100644 --- a/drivers/net/ethernet/intel/igb/e1000_i210.c +++ b/drivers/net/ethernet/intel/igb/e1000_i210.c @@ -834,3 +834,69 @@ s32 igb_init_nvm_params_i210(struct e1000_hw *hw) } return ret_val; } + +/** + * igb_pll_workaround_i210 + * @hw: pointer to the HW structure + * + * Works around an errata in the PLL circuit where it occasionally + * provides the wrong clock frequency after power up. + **/ +s32 igb_pll_workaround_i210(struct e1000_hw *hw) +{ + s32 ret_val; + u32 wuc, mdicnfg, ctrl, ctrl_ext, reg_val; + u16 nvm_word, phy_word, pci_word, tmp_nvm; + int i; + + /* Get and set needed register values */ + wuc = rd32(E1000_WUC); + mdicnfg = rd32(E1000_MDICNFG); + reg_val = mdicnfg & ~E1000_MDICNFG_EXT_MDIO; + wr32(E1000_MDICNFG, reg_val); + + /* Get data from NVM, or set default */ + ret_val = igb_read_invm_word_i210(hw, E1000_INVM_AUTOLOAD, + &nvm_word); + if (ret_val) + nvm_word = E1000_INVM_DEFAULT_AL; + tmp_nvm = nvm_word | E1000_INVM_PLL_WO_VAL; + for (i = 0; i < E1000_MAX_PLL_TRIES; i++) { + /* check current state directly from internal PHY */ + igb_read_phy_reg_gs40g(hw, (E1000_PHY_PLL_FREQ_PAGE | + E1000_PHY_PLL_FREQ_REG), &phy_word); + if ((phy_word & E1000_PHY_PLL_UNCONF) + != E1000_PHY_PLL_UNCONF) { + ret_val = 0; + break; + } else { + ret_val = -E1000_ERR_PHY; + } + /* directly reset the internal PHY */ + ctrl = rd32(E1000_CTRL); + wr32(E1000_CTRL, ctrl|E1000_CTRL_PHY_RST); + + ctrl_ext = rd32(E1000_CTRL_EXT); + ctrl_ext |= (E1000_CTRL_EXT_PHYPDEN | E1000_CTRL_EXT_SDLPE); + wr32(E1000_CTRL_EXT, ctrl_ext); + + wr32(E1000_WUC, 0); + reg_val = (E1000_INVM_AUTOLOAD << 4) | (tmp_nvm << 16); + wr32(E1000_EEARBC_I210, reg_val); + + igb_read_pci_cfg(hw, E1000_PCI_PMCSR, &pci_word); + pci_word |= E1000_PCI_PMCSR_D3; + igb_write_pci_cfg(hw, E1000_PCI_PMCSR, &pci_word); + usleep_range(1000, 2000); + pci_word &= ~E1000_PCI_PMCSR_D3; + igb_write_pci_cfg(hw, E1000_PCI_PMCSR, &pci_word); + reg_val = (E1000_INVM_AUTOLOAD << 4) | (nvm_word << 16); + wr32(E1000_EEARBC_I210, reg_val); + + /* restore WUC register */ + wr32(E1000_WUC, wuc); + } + /* restore MDICNFG setting */ + wr32(E1000_MDICNFG, mdicnfg); + return ret_val; +} diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.h b/drivers/net/ethernet/intel/igb/e1000_i210.h index 9f34976687ba..3442b6357d01 100644 --- a/drivers/net/ethernet/intel/igb/e1000_i210.h +++ b/drivers/net/ethernet/intel/igb/e1000_i210.h @@ -33,6 +33,7 @@ s32 igb_read_xmdio_reg(struct e1000_hw *hw, u16 addr, u8 dev_addr, u16 *data); s32 igb_write_xmdio_reg(struct e1000_hw *hw, u16 addr, u8 dev_addr, u16 data); s32 igb_init_nvm_params_i210(struct e1000_hw *hw); bool igb_get_flash_presence_i210(struct e1000_hw *hw); +s32 igb_pll_workaround_i210(struct e1000_hw *hw); #define E1000_STM_OPCODE 0xDB00 #define E1000_EEPROM_FLASH_SIZE_WORD 0x11 @@ -78,4 +79,15 @@ enum E1000_INVM_STRUCTURE_TYPE { #define NVM_LED_1_CFG_DEFAULT_I211 0x0184 #define NVM_LED_0_2_CFG_DEFAULT_I211 0x200C +/* PLL Defines */ +#define E1000_PCI_PMCSR 0x44 +#define E1000_PCI_PMCSR_D3 0x03 +#define E1000_MAX_PLL_TRIES 5 +#define E1000_PHY_PLL_UNCONF 0xFF +#define E1000_PHY_PLL_FREQ_PAGE 0xFC0000 +#define E1000_PHY_PLL_FREQ_REG 0x000E +#define E1000_INVM_DEFAULT_AL 0x202F +#define E1000_INVM_AUTOLOAD 0x0A +#define E1000_INVM_PLL_WO_VAL 0x0010 + #endif diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h index 1cc4b1a7e597..f5ba4e4eafb9 100644 --- a/drivers/net/ethernet/intel/igb/e1000_regs.h +++ b/drivers/net/ethernet/intel/igb/e1000_regs.h @@ -66,6 +66,7 @@ #define E1000_PBA 0x01000 /* Packet Buffer Allocation - RW */ #define E1000_PBS 0x01008 /* Packet Buffer Size */ #define E1000_EEMNGCTL 0x01010 /* MNG EEprom Control */ +#define E1000_EEARBC_I210 0x12024 /* EEPROM Auto Read Bus Control */ #define E1000_EEWR 0x0102C /* EEPROM Write Register - RW */ #define E1000_I2CCMD 0x01028 /* SFPI2C Command Register - RW */ #define E1000_FRTIMER 0x01048 /* Free Running Timer - RW */ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index f145adbb55ac..57a96e00df8c 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -7215,6 +7215,20 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd) } } +void igb_read_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value) +{ + struct igb_adapter *adapter = hw->back; + + pci_read_config_word(adapter->pdev, reg, value); +} + +void igb_write_pci_cfg(struct e1000_hw *hw, u32 reg, u16 *value) +{ + struct igb_adapter *adapter = hw->back; + + pci_write_config_word(adapter->pdev, reg, *value); +} + s32 igb_read_pcie_cap_reg(struct e1000_hw *hw, u32 reg, u16 *value) { struct igb_adapter *adapter = hw->back; From 4237ba43b65aa989674c89fc4f2fe46eebc501ee Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 10 Jul 2014 10:50:19 +0200 Subject: [PATCH 145/455] fuse: restructure ->rename2() Make ->rename2() universal, i.e. able to handle zero flags. This is to make future change of the API easier. Signed-off-by: Miklos Szeredi --- fs/fuse/dir.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 202a9721be93..0c6048247a34 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -815,13 +815,6 @@ static int fuse_rename_common(struct inode *olddir, struct dentry *oldent, return err; } -static int fuse_rename(struct inode *olddir, struct dentry *oldent, - struct inode *newdir, struct dentry *newent) -{ - return fuse_rename_common(olddir, oldent, newdir, newent, 0, - FUSE_RENAME, sizeof(struct fuse_rename_in)); -} - static int fuse_rename2(struct inode *olddir, struct dentry *oldent, struct inode *newdir, struct dentry *newent, unsigned int flags) @@ -832,17 +825,30 @@ static int fuse_rename2(struct inode *olddir, struct dentry *oldent, if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE)) return -EINVAL; - if (fc->no_rename2 || fc->minor < 23) - return -EINVAL; + if (flags) { + if (fc->no_rename2 || fc->minor < 23) + return -EINVAL; - err = fuse_rename_common(olddir, oldent, newdir, newent, flags, - FUSE_RENAME2, sizeof(struct fuse_rename2_in)); - if (err == -ENOSYS) { - fc->no_rename2 = 1; - err = -EINVAL; + err = fuse_rename_common(olddir, oldent, newdir, newent, flags, + FUSE_RENAME2, + sizeof(struct fuse_rename2_in)); + if (err == -ENOSYS) { + fc->no_rename2 = 1; + err = -EINVAL; + } + } else { + err = fuse_rename_common(olddir, oldent, newdir, newent, 0, + FUSE_RENAME, + sizeof(struct fuse_rename_in)); } - return err; + return err; +} + +static int fuse_rename(struct inode *olddir, struct dentry *oldent, + struct inode *newdir, struct dentry *newent) +{ + return fuse_rename2(olddir, oldent, newdir, newent, 0); } static int fuse_link(struct dentry *entry, struct inode *newdir, From bbc1c5e8ad6dfebf9d13b8a4ccdf66c92913eac9 Mon Sep 17 00:00:00 2001 From: Lars Ellenberg Date: Wed, 9 Jul 2014 21:18:32 +0200 Subject: [PATCH 146/455] drbd: fix regression 'out of mem, failed to invoke fence-peer helper' Since linux kernel 3.13, kthread_run() internally uses wait_for_completion_killable(). We sometimes may use kthread_run() while we still have a signal pending, which we used to kick our threads out of potentially blocking network functions, causing kthread_run() to mistake that as a new fatal signal and fail. Fix: flush_signals() before kthread_run(). Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe --- drivers/block/drbd/drbd_nl.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index 1b35c45c92b7..3f2e16738080 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -544,6 +544,12 @@ void conn_try_outdate_peer_async(struct drbd_connection *connection) struct task_struct *opa; kref_get(&connection->kref); + /* We may just have force_sig()'ed this thread + * to get it out of some blocking network function. + * Clear signals; otherwise kthread_run(), which internally uses + * wait_on_completion_killable(), will mistake our pending signal + * for a new fatal signal and fail. */ + flush_signals(current); opa = kthread_run(_try_outdate_peer_async, connection, "drbd_async_h"); if (IS_ERR(opa)) { drbd_err(connection, "out of mem, failed to invoke fence-peer helper\n"); From 29e2435fd6d71e0136e2c2ff0433b7dbeeaaccfd Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Tue, 8 Jul 2014 16:54:18 +0100 Subject: [PATCH 147/455] efi: fdt: Do not report an error during boot if UEFI is not available Currently, fdt_find_uefi_params() reports an error if no EFI parameters are found in the DT. This is however a valid case for non-UEFI kernel booting. This patch checks changes the error reporting to a pr_info("UEFI not found") when no EFI parameters are found in the DT. Signed-off-by: Catalin Marinas Acked-by: Leif Lindholm Tested-by: Leif Lindholm Signed-off-by: Matt Fleming --- drivers/firmware/efi/efi.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index eff1a2f22f09..dc79346689e6 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -346,6 +346,7 @@ static __initdata struct { struct param_info { int verbose; + int found; void *params; }; @@ -362,16 +363,12 @@ static int __init fdt_find_uefi_params(unsigned long node, const char *uname, (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) return 0; - pr_info("Getting parameters from FDT:\n"); - for (i = 0; i < ARRAY_SIZE(dt_params); i++) { prop = of_get_flat_dt_prop(node, dt_params[i].propname, &len); - if (!prop) { - pr_err("Can't find %s in device tree!\n", - dt_params[i].name); + if (!prop) return 0; - } dest = info->params + dt_params[i].offset; + info->found++; val = of_read_number(prop, len / sizeof(u32)); @@ -390,10 +387,21 @@ static int __init fdt_find_uefi_params(unsigned long node, const char *uname, int __init efi_get_fdt_params(struct efi_fdt_params *params, int verbose) { struct param_info info; + int ret; + + pr_info("Getting EFI parameters from FDT:\n"); info.verbose = verbose; + info.found = 0; info.params = params; - return of_scan_flat_dt(fdt_find_uefi_params, &info); + ret = of_scan_flat_dt(fdt_find_uefi_params, &info); + if (!info.found) + pr_info("UEFI not found.\n"); + else if (!ret) + pr_err("Can't find '%s' in device tree!\n", + dt_params[info.found].name); + + return ret; } #endif /* CONFIG_EFI_PARAMS_FROM_FDT */ From c7fb93ec51d462ec3540a729ba446663c26a0505 Mon Sep 17 00:00:00 2001 From: Michael Brown Date: Thu, 10 Jul 2014 12:26:20 +0100 Subject: [PATCH 148/455] x86/efi: Include a .bss section within the PE/COFF headers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The PE/COFF headers currently describe only the initialised-data portions of the image, and result in no space being allocated for the uninitialised-data portions. Consequently, the EFI boot stub will end up overwriting unexpected areas of memory, with unpredictable results. Fix by including a .bss section in the PE/COFF headers (functionally equivalent to the init_size field in the bzImage header). Signed-off-by: Michael Brown Cc: Thomas Bächler Cc: Josh Boyer Cc: Signed-off-by: Matt Fleming --- arch/x86/boot/header.S | 26 +++++++++++++++++++++---- arch/x86/boot/tools/build.c | 38 +++++++++++++++++++++++++++++-------- 2 files changed, 52 insertions(+), 12 deletions(-) diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S index 84c223479e3c..7a6d43a554d7 100644 --- a/arch/x86/boot/header.S +++ b/arch/x86/boot/header.S @@ -91,10 +91,9 @@ bs_die: .section ".bsdata", "a" bugger_off_msg: - .ascii "Direct floppy boot is not supported. " - .ascii "Use a boot loader program instead.\r\n" + .ascii "Use a boot loader.\r\n" .ascii "\n" - .ascii "Remove disk and press any key to reboot ...\r\n" + .ascii "Remove disk and press any key to reboot...\r\n" .byte 0 #ifdef CONFIG_EFI_STUB @@ -108,7 +107,7 @@ coff_header: #else .word 0x8664 # x86-64 #endif - .word 3 # nr_sections + .word 4 # nr_sections .long 0 # TimeDateStamp .long 0 # PointerToSymbolTable .long 1 # NumberOfSymbols @@ -250,6 +249,25 @@ section_table: .word 0 # NumberOfLineNumbers .long 0x60500020 # Characteristics (section flags) + # + # The offset & size fields are filled in by build.c. + # + .ascii ".bss" + .byte 0 + .byte 0 + .byte 0 + .byte 0 + .long 0 + .long 0x0 + .long 0 # Size of initialized data + # on disk + .long 0x0 + .long 0 # PointerToRelocations + .long 0 # PointerToLineNumbers + .word 0 # NumberOfRelocations + .word 0 # NumberOfLineNumbers + .long 0xc8000080 # Characteristics (section flags) + #endif /* CONFIG_EFI_STUB */ # Kernel attributes; used by setup. This is part 1 of the diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c index 1a2f2121cada..a7661c430cd9 100644 --- a/arch/x86/boot/tools/build.c +++ b/arch/x86/boot/tools/build.c @@ -143,7 +143,7 @@ static void usage(void) #ifdef CONFIG_EFI_STUB -static void update_pecoff_section_header(char *section_name, u32 offset, u32 size) +static void update_pecoff_section_header_fields(char *section_name, u32 vma, u32 size, u32 datasz, u32 offset) { unsigned int pe_header; unsigned short num_sections; @@ -164,10 +164,10 @@ static void update_pecoff_section_header(char *section_name, u32 offset, u32 siz put_unaligned_le32(size, section + 0x8); /* section header vma field */ - put_unaligned_le32(offset, section + 0xc); + put_unaligned_le32(vma, section + 0xc); /* section header 'size of initialised data' field */ - put_unaligned_le32(size, section + 0x10); + put_unaligned_le32(datasz, section + 0x10); /* section header 'file offset' field */ put_unaligned_le32(offset, section + 0x14); @@ -179,6 +179,11 @@ static void update_pecoff_section_header(char *section_name, u32 offset, u32 siz } } +static void update_pecoff_section_header(char *section_name, u32 offset, u32 size) +{ + update_pecoff_section_header_fields(section_name, offset, size, size, offset); +} + static void update_pecoff_setup_and_reloc(unsigned int size) { u32 setup_offset = 0x200; @@ -203,9 +208,6 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz) pe_header = get_unaligned_le32(&buf[0x3c]); - /* Size of image */ - put_unaligned_le32(file_sz, &buf[pe_header + 0x50]); - /* * Size of code: Subtract the size of the first sector (512 bytes) * which includes the header. @@ -220,6 +222,22 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz) update_pecoff_section_header(".text", text_start, text_sz); } +static void update_pecoff_bss(unsigned int file_sz, unsigned int init_sz) +{ + unsigned int pe_header; + unsigned int bss_sz = init_sz - file_sz; + + pe_header = get_unaligned_le32(&buf[0x3c]); + + /* Size of uninitialized data */ + put_unaligned_le32(bss_sz, &buf[pe_header + 0x24]); + + /* Size of image */ + put_unaligned_le32(init_sz, &buf[pe_header + 0x50]); + + update_pecoff_section_header_fields(".bss", file_sz, bss_sz, 0, 0); +} + static int reserve_pecoff_reloc_section(int c) { /* Reserve 0x20 bytes for .reloc section */ @@ -259,6 +277,8 @@ static void efi_stub_entry_update(void) static inline void update_pecoff_setup_and_reloc(unsigned int size) {} static inline void update_pecoff_text(unsigned int text_start, unsigned int file_sz) {} +static inline void update_pecoff_bss(unsigned int file_sz, + unsigned int init_sz) {} static inline void efi_stub_defaults(void) {} static inline void efi_stub_entry_update(void) {} @@ -310,7 +330,7 @@ static void parse_zoffset(char *fname) int main(int argc, char ** argv) { - unsigned int i, sz, setup_sectors; + unsigned int i, sz, setup_sectors, init_sz; int c; u32 sys_size; struct stat sb; @@ -376,7 +396,9 @@ int main(int argc, char ** argv) buf[0x1f1] = setup_sectors-1; put_unaligned_le32(sys_size, &buf[0x1f4]); - update_pecoff_text(setup_sectors * 512, sz + i + ((sys_size * 16) - sz)); + update_pecoff_text(setup_sectors * 512, i + (sys_size * 16)); + init_sz = get_unaligned_le32(&buf[0x260]); + update_pecoff_bss(i + (sys_size * 16), init_sz); efi_stub_entry_update(); From 76252723e88681628a3dbb9c09c963e095476f73 Mon Sep 17 00:00:00 2001 From: Stefan Assmann Date: Thu, 10 Jul 2014 03:29:39 -0700 Subject: [PATCH 149/455] igb: do a reset on SR-IOV re-init if device is down To properly re-initialize SR-IOV it is necessary to reset the device even if it is already down. Not doing this may result in Tx unit hangs. Cc: stable Signed-off-by: Stefan Assmann Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/igb/igb_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 57a96e00df8c..a9537ba7a5a0 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -7592,6 +7592,8 @@ static int igb_sriov_reinit(struct pci_dev *dev) if (netif_running(netdev)) igb_close(netdev); + else + igb_reset(adapter); igb_clear_interrupt_scheme(adapter); From 78b3321610bf920d7fceb1a0236faa881be0bcf3 Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Thu, 7 Aug 2014 22:03:00 +0100 Subject: [PATCH 150/455] iio:core: Handle error when mask type is not separate When event spec is shared by multiple channels, which has definition for mask_shared_by_type, iio_device_register_eventset fails. For example: static const struct iio_event_spec iio_dummy_events[] = { { .type = IIO_EV_TYPE_THRESH, .dir = IIO_EV_DIR_RISING, .mask_separate = BIT(IIO_EV_INFO_ENABLE), .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE), }, { .type = IIO_EV_TYPE_THRESH, .dir = IIO_EV_DIR_FALLING, .mask_separate = BIT(IIO_EV_INFO_ENABLE),a .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE), } }; If two channels use this event spec, this will result in error. This change handles EBUSY error similar to iio_device_add_info_mask_type(). Signed-off-by: Srinivas Pandruvada Signed-off-by: Jonathan Cameron Cc: Stable@vger.kernel.org --- drivers/iio/industrialio-event.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iio/industrialio-event.c b/drivers/iio/industrialio-event.c index 258a973a1fb8..bfbf4d419f41 100644 --- a/drivers/iio/industrialio-event.c +++ b/drivers/iio/industrialio-event.c @@ -345,6 +345,9 @@ static int iio_device_add_event(struct iio_dev *indio_dev, &indio_dev->event_interface->dev_attr_list); kfree(postfix); + if ((ret == -EBUSY) && (shared_by != IIO_SEPARATE)) + continue; + if (ret) return ret; From 19eeb2f9e750f47c805f66e3b0e889b12557d80f Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Thu, 10 Jul 2014 18:43:01 -0400 Subject: [PATCH 151/455] farsync: fix invalid memory accesses in fst_add_one() and fst_init_card() There are several issues in fst_add_one() and fst_init_card(): - invalid pointer dereference at card->ports[card->nports - 1] if register_hdlc_device() fails for the first port in fst_init_card(); - fst_card_array overflow at fst_card_array[no_of_cards_added] because there is no checks for array overflow; - use after free because pointer to deallocated card is left in fst_card_array if something fails after fst_card_array[no_of_cards_added] = card; - several leaks on failure paths in fst_add_one(). The patch fixes all the issues and makes code more readable. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: David S. Miller --- drivers/net/wan/farsync.c | 112 ++++++++++++++++++++------------------ 1 file changed, 58 insertions(+), 54 deletions(-) diff --git a/drivers/net/wan/farsync.c b/drivers/net/wan/farsync.c index 93ace042d0aa..1f041271f7fe 100644 --- a/drivers/net/wan/farsync.c +++ b/drivers/net/wan/farsync.c @@ -2363,7 +2363,7 @@ static char *type_strings[] = { "FarSync TE1" }; -static void +static int fst_init_card(struct fst_card_info *card) { int i; @@ -2374,24 +2374,21 @@ fst_init_card(struct fst_card_info *card) * we'll have to revise it in some way then. */ for (i = 0; i < card->nports; i++) { - err = register_hdlc_device(card->ports[i].dev); - if (err < 0) { - int j; + err = register_hdlc_device(card->ports[i].dev); + if (err < 0) { pr_err("Cannot register HDLC device for port %d (errno %d)\n", - i, -err); - for (j = i; j < card->nports; j++) { - free_netdev(card->ports[j].dev); - card->ports[j].dev = NULL; - } - card->nports = i; - break; - } + i, -err); + while (i--) + unregister_hdlc_device(card->ports[i].dev); + return err; + } } pr_info("%s-%s: %s IRQ%d, %d ports\n", port_to_dev(&card->ports[0])->name, port_to_dev(&card->ports[card->nports - 1])->name, type_strings[card->type], card->irq, card->nports); + return 0; } static const struct net_device_ops fst_ops = { @@ -2447,15 +2444,12 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent) /* Try to enable the device */ if ((err = pci_enable_device(pdev)) != 0) { pr_err("Failed to enable card. Err %d\n", -err); - kfree(card); - return err; + goto enable_fail; } if ((err = pci_request_regions(pdev, "FarSync")) !=0) { pr_err("Failed to allocate regions. Err %d\n", -err); - pci_disable_device(pdev); - kfree(card); - return err; + goto regions_fail; } /* Get virtual addresses of memory regions */ @@ -2464,30 +2458,21 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent) card->phys_ctlmem = pci_resource_start(pdev, 3); if ((card->mem = ioremap(card->phys_mem, FST_MEMSIZE)) == NULL) { pr_err("Physical memory remap failed\n"); - pci_release_regions(pdev); - pci_disable_device(pdev); - kfree(card); - return -ENODEV; + err = -ENODEV; + goto ioremap_physmem_fail; } if ((card->ctlmem = ioremap(card->phys_ctlmem, 0x10)) == NULL) { pr_err("Control memory remap failed\n"); - pci_release_regions(pdev); - pci_disable_device(pdev); - iounmap(card->mem); - kfree(card); - return -ENODEV; + err = -ENODEV; + goto ioremap_ctlmem_fail; } dbg(DBG_PCI, "kernel mem %p, ctlmem %p\n", card->mem, card->ctlmem); /* Register the interrupt handler */ if (request_irq(pdev->irq, fst_intr, IRQF_SHARED, FST_DEV_NAME, card)) { pr_err("Unable to register interrupt %d\n", card->irq); - pci_release_regions(pdev); - pci_disable_device(pdev); - iounmap(card->ctlmem); - iounmap(card->mem); - kfree(card); - return -ENODEV; + err = -ENODEV; + goto irq_fail; } /* Record info we need */ @@ -2513,13 +2498,8 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent) while (i--) free_netdev(card->ports[i].dev); pr_err("FarSync: out of memory\n"); - free_irq(card->irq, card); - pci_release_regions(pdev); - pci_disable_device(pdev); - iounmap(card->ctlmem); - iounmap(card->mem); - kfree(card); - return -ENODEV; + err = -ENOMEM; + goto hdlcdev_fail; } card->ports[i].dev = dev; card->ports[i].card = card; @@ -2565,9 +2545,16 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent) pci_set_drvdata(pdev, card); /* Remainder of card setup */ + if (no_of_cards_added >= FST_MAX_CARDS) { + pr_err("FarSync: too many cards\n"); + err = -ENOMEM; + goto card_array_fail; + } fst_card_array[no_of_cards_added] = card; card->card_no = no_of_cards_added++; /* Record instance and bump it */ - fst_init_card(card); + err = fst_init_card(card); + if (err) + goto init_card_fail; if (card->family == FST_FAMILY_TXU) { /* * Allocate a dma buffer for transmit and receives @@ -2577,29 +2564,46 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent) &card->rx_dma_handle_card); if (card->rx_dma_handle_host == NULL) { pr_err("Could not allocate rx dma buffer\n"); - fst_disable_intr(card); - pci_release_regions(pdev); - pci_disable_device(pdev); - iounmap(card->ctlmem); - iounmap(card->mem); - kfree(card); - return -ENOMEM; + err = -ENOMEM; + goto rx_dma_fail; } card->tx_dma_handle_host = pci_alloc_consistent(card->device, FST_MAX_MTU, &card->tx_dma_handle_card); if (card->tx_dma_handle_host == NULL) { pr_err("Could not allocate tx dma buffer\n"); - fst_disable_intr(card); - pci_release_regions(pdev); - pci_disable_device(pdev); - iounmap(card->ctlmem); - iounmap(card->mem); - kfree(card); - return -ENOMEM; + err = -ENOMEM; + goto tx_dma_fail; } } return 0; /* Success */ + +tx_dma_fail: + pci_free_consistent(card->device, FST_MAX_MTU, + card->rx_dma_handle_host, + card->rx_dma_handle_card); +rx_dma_fail: + fst_disable_intr(card); + for (i = 0 ; i < card->nports ; i++) + unregister_hdlc_device(card->ports[i].dev); +init_card_fail: + fst_card_array[card->card_no] = NULL; +card_array_fail: + for (i = 0 ; i < card->nports ; i++) + free_netdev(card->ports[i].dev); +hdlcdev_fail: + free_irq(card->irq, card); +irq_fail: + iounmap(card->ctlmem); +ioremap_ctlmem_fail: + iounmap(card->mem); +ioremap_physmem_fail: + pci_release_regions(pdev); +regions_fail: + pci_disable_device(pdev); +enable_fail: + kfree(card); + return err; } /* From d0a7ebbc119738439ff00f7fadbd343ae20ea5e8 Mon Sep 17 00:00:00 2001 From: Amritha Nambiar Date: Thu, 10 Jul 2014 17:29:21 -0700 Subject: [PATCH 152/455] GRE: enable offloads for GRE To get offloads to work with Generic Routing Encapsulation (GRE), the outer transport header has to be reset after skb_push is done. This patch has the support for this fix and hence GRE offloading. Signed-off-by: Amritha Nambiar Signed-off-by: Joseph Gasparakis Tested-By: Jim Young Signed-off-by: Jeff Kirsher Signed-off-by: David S. Miller --- net/ipv4/gre_demux.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 4e9619bca732..0485bf7f8f03 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -68,6 +68,7 @@ void gre_build_header(struct sk_buff *skb, const struct tnl_ptk_info *tpi, skb_push(skb, hdr_len); + skb_reset_transport_header(skb); greh = (struct gre_base_hdr *)skb->data; greh->flags = tnl_flags_to_gre_flags(tpi->flags); greh->protocol = tpi->proto; From 4cad9f3b61c7268fa89ab8096e23202300399b5d Mon Sep 17 00:00:00 2001 From: Suresh Reddy Date: Fri, 11 Jul 2014 14:03:01 +0530 Subject: [PATCH 153/455] be2net: set EQ DB clear-intr bit in be_open() On BE3, if the clear-interrupt bit of the EQ doorbell is not set the first time it is armed, ocassionally we have observed that the EQ doesn't raise anymore interrupts even if it is in armed state. This patch fixes this by setting the clear-interrupt bit when EQs are armed for the first time in be_open(). Signed-off-by: Suresh Reddy Signed-off-by: Sathya Perla Signed-off-by: David S. Miller --- drivers/net/ethernet/emulex/benet/be_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 34a26e42f19d..1e187fb760f8 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -2902,7 +2902,7 @@ static int be_open(struct net_device *netdev) for_all_evt_queues(adapter, eqo, i) { napi_enable(&eqo->napi); be_enable_busy_poll(eqo); - be_eq_notify(adapter, eqo->q.id, true, false, 0); + be_eq_notify(adapter, eqo->q.id, true, true, 0); } adapter->flags |= BE_FLAGS_NAPI_ENABLED; From a91d45f1a343188793d6f2bdf1a72c64015a8255 Mon Sep 17 00:00:00 2001 From: hayeswang Date: Fri, 11 Jul 2014 16:48:27 +0800 Subject: [PATCH 154/455] r8152: fix r8152_csum_workaround function The transport offset of the IPv4 packet should be fixed and wouldn't be out of the hw limitation, so the r8152_csum_workaround() should be used for IPv6 packets. Signed-off-by: Hayes Wang Signed-off-by: David S. Miller --- drivers/net/usb/r8152.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index a795ecf8d5b6..7bad2d316637 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -1359,7 +1359,7 @@ static void r8152_csum_workaround(struct r8152 *tp, struct sk_buff *skb, struct sk_buff_head seg_list; struct sk_buff *segs, *nskb; - features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO); + features &= ~(NETIF_F_SG | NETIF_F_IPV6_CSUM | NETIF_F_TSO6); segs = skb_gso_segment(skb, features); if (IS_ERR(segs) || !segs) goto drop; From 999417549c16dd0e3a382aa9f6ae61688db03181 Mon Sep 17 00:00:00 2001 From: Jon Paul Maloy Date: Fri, 11 Jul 2014 08:45:27 -0400 Subject: [PATCH 155/455] tipc: clear 'next'-pointer of message fragments before reassembly If the 'next' pointer of the last fragment buffer in a message is not zeroed before reassembly, we risk ending up with a corrupt message, since the reassembly function itself isn't doing this. Currently, when a buffer is retrieved from the deferred queue of the broadcast link, the next pointer is not cleared, with the result as described above. This commit corrects this, and thereby fixes a bug that may occur when long broadcast messages are transmitted across dual interfaces. The bug has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc: message reassembly using fragment chain") This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy Signed-off-by: David S. Miller --- net/tipc/bcast.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 26631679a1fa..55c6c9d3e1ce 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -559,6 +559,7 @@ receive: buf = node->bclink.deferred_head; node->bclink.deferred_head = buf->next; + buf->next = NULL; node->bclink.deferred_size--; goto receive; } From 1d9dbf154295aa33819b08340464ff15495e83d8 Mon Sep 17 00:00:00 2001 From: Darren Hart Date: Fri, 11 Jul 2014 11:03:39 +0000 Subject: [PATCH 156/455] ACPI / documentation: Remove reference to acpi_platform_device_ids from enumeration.txt As of: 4845934 ACPI / scan: use platform bus type by default for _HID enumeration ACPI uses the platform bus by default, changing the opt-in to an opt-out policy, eliminating the acpi_platform_device_ids table and replacing it with forbidden_id_list[]. Remove the qualifying paragraph from the acpi/enumeration documentation as it no longer applies. Reported-by: Max Eliaser Signed-off-by: Darren Hart Signed-off-by: Rafael J. Wysocki --- Documentation/acpi/enumeration.txt | 6 ------ 1 file changed, 6 deletions(-) diff --git a/Documentation/acpi/enumeration.txt b/Documentation/acpi/enumeration.txt index fd786ea13a1f..e182be5e3c83 100644 --- a/Documentation/acpi/enumeration.txt +++ b/Documentation/acpi/enumeration.txt @@ -60,12 +60,6 @@ If the driver needs to perform more complex initialization like getting and configuring GPIOs it can get its ACPI handle and extract this information from ACPI tables. -Currently the kernel is not able to automatically determine from which ACPI -device it should make the corresponding platform device so we need to add -the ACPI device explicitly to acpi_platform_device_ids list defined in -drivers/acpi/acpi_platform.c. This limitation is only for the platform -devices, SPI and I2C devices are created automatically as described below. - DMA support ~~~~~~~~~~~ DMA controllers enumerated via ACPI should be registered in the system to From aff008ad813c7cf3cfe7b532e7ba2c526c136f22 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Tue, 17 Jun 2014 15:51:02 -0700 Subject: [PATCH 157/455] platform_get_irq: Revert to platform_get_resource if of_irq_get fails Commits 9ec36ca (of/irq: do irq resolution in platform_get_irq) and ad69674 (of/irq: do irq resolution in platform_get_irq_byname) change the semantics of platform_get_irq and platform_get_irq_byname to always rely on devicetree information if devicetree is enabled and if a devicetree node is attached to the device. The functions now return an error if the devicetree data does not include interrupt information, even if the information is available as platform resource data. This causes mfd client drivers to fail if the interrupt number is passed via platform resources. Therefore, if of_irq_get fails, try platform_get_resource as method of last resort. This restores the original functionality for drivers depending on platform resources to get irq information. Cc: Russell King Cc: Tony Lindgren Cc: Grant Likely Cc: Grygorii Strashko Signed-off-by: Guenter Roeck Acked-by: Rob Herring Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/base/platform.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 9e9227e1762d..eee48c49f5de 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -89,8 +89,13 @@ int platform_get_irq(struct platform_device *dev, unsigned int num) return dev->archdata.irqs[num]; #else struct resource *r; - if (IS_ENABLED(CONFIG_OF_IRQ) && dev->dev.of_node) - return of_irq_get(dev->dev.of_node, num); + if (IS_ENABLED(CONFIG_OF_IRQ) && dev->dev.of_node) { + int ret; + + ret = of_irq_get(dev->dev.of_node, num); + if (ret >= 0 || ret == -EPROBE_DEFER) + return ret; + } r = platform_get_resource(dev, IORESOURCE_IRQ, num); @@ -133,8 +138,13 @@ int platform_get_irq_byname(struct platform_device *dev, const char *name) { struct resource *r; - if (IS_ENABLED(CONFIG_OF_IRQ) && dev->dev.of_node) - return of_irq_get_byname(dev->dev.of_node, name); + if (IS_ENABLED(CONFIG_OF_IRQ) && dev->dev.of_node) { + int ret; + + ret = of_irq_get_byname(dev->dev.of_node, name); + if (ret >= 0 || ret == -EPROBE_DEFER) + return ret; + } r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name); return r ? r->start : -ENXIO; From 71702e6e52c8312f4c6797a9787d0f8b5656156f Mon Sep 17 00:00:00 2001 From: Martin Fuzzey Date: Fri, 7 Nov 2014 13:54:00 +0000 Subject: [PATCH 158/455] iio: mma8452: Use correct acceleration units. The userspace interface for acceleration sensors is documented as using m/s^2 units [Documentation/ABI/testing/sysfs-bus-iio] The fullscale raw value for the mma8452 (-2048) corresponds to -2G, -4G or -8G depending on the seleted mode. The scale table was converting to G rather than m/s^2. Change the scaling table to match the documented interface. Signed-off-by: Martin Fuzzey Signed-off-by: Jonathan Cameron Cc: stable@vger.kernel.org --- drivers/iio/accel/mma8452.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c index 17aeea170566..2a5fa9a436e5 100644 --- a/drivers/iio/accel/mma8452.c +++ b/drivers/iio/accel/mma8452.c @@ -111,8 +111,14 @@ static const int mma8452_samp_freq[8][2] = { {6, 250000}, {1, 560000} }; +/* + * Hardware has fullscale of -2G, -4G, -8G corresponding to raw value -2048 + * The userspace interface uses m/s^2 and we declare micro units + * So scale factor is given by: + * g * N * 1000000 / 2048 for N = 2, 4, 8 and g=9.80665 + */ static const int mma8452_scales[3][2] = { - {0, 977}, {0, 1953}, {0, 3906} + {0, 9577}, {0, 19154}, {0, 38307} }; static ssize_t mma8452_show_samp_freq_avail(struct device *dev, From 0b462c89e31f7eb6789713437eb551833ee16ff3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 5 Jul 2014 18:43:21 -0400 Subject: [PATCH 159/455] blkcg: don't call into policy draining if root_blkg is already gone While a queue is being destroyed, all the blkgs are destroyed and its ->root_blkg pointer is set to NULL. If someone else starts to drain while the queue is in this state, the following oops happens. NULL pointer dereference at 0000000000000028 IP: [] blk_throtl_drain+0x84/0x230 PGD e4a1067 PUD b773067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched] CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000 RIP: 0010:[] [] blk_throtl_drain+0x84/0x230 RSP: 0018:ffff88000efd7bf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450 R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28 FS: 00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0 Stack: ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80 ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58 ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450 Call Trace: [] blkcg_drain_queue+0x1f/0x60 [] __blk_drain_queue+0x71/0x180 [] blk_queue_bypass_start+0x6e/0xb0 [] blkcg_deactivate_policy+0x38/0x120 [] blk_throtl_exit+0x34/0x50 [] blkcg_exit_queue+0x35/0x40 [] blk_release_queue+0x26/0xd0 [] kobject_cleanup+0x38/0x70 [] kobject_put+0x28/0x60 [] blk_put_queue+0x15/0x20 [] scsi_device_dev_release_usercontext+0x16b/0x1c0 [] execute_in_process_context+0x89/0xa0 [] scsi_device_dev_release+0x1c/0x20 [] device_release+0x32/0xa0 [] kobject_cleanup+0x38/0x70 [] kobject_put+0x28/0x60 [] put_device+0x17/0x20 [] __scsi_remove_device+0xa9/0xe0 [] scsi_remove_device+0x2b/0x40 [] sdev_store_delete+0x27/0x30 [] dev_attr_store+0x18/0x30 [] sysfs_kf_write+0x3e/0x50 [] kernfs_fop_write+0xe7/0x170 [] vfs_write+0xaf/0x1d0 [] SyS_write+0x4d/0xc0 [] system_call_fastpath+0x16/0x1b 776687bce42b ("block, blk-mq: draining can't be skipped even if bypass_depth was non-zero") made it easier to trigger this bug by making blk_queue_bypass_start() drain even when it loses the first bypass test to blk_cleanup_queue(); however, the bug has always been there even before the commit as blk_queue_bypass_start() could race against queue destruction, win the initial bypass test but perform the actual draining after blk_cleanup_queue() already destroyed all blkgs. Fix it by skippping calling into policy draining if all the blkgs are already gone. Signed-off-by: Tejun Heo Reported-by: Shirish Pargaonkar Reported-by: Sasha Levin Reported-by: Jet Chen Cc: stable@vger.kernel.org Tested-by: Shirish Pargaonkar Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index b9f4cc494ece..28d227c5ca77 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -872,6 +872,13 @@ void blkcg_drain_queue(struct request_queue *q) { lockdep_assert_held(q->queue_lock); + /* + * @q could be exiting and already have destroyed all blkgs as + * indicated by NULL root_blkg. If so, don't confuse policies. + */ + if (!q->root_blkg) + return; + blk_throtl_drain(q); } From 17089a29a25a3bfe8d14520cd866b7d635ffe5ba Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Fri, 11 Jul 2014 10:20:45 -0400 Subject: [PATCH 160/455] nfs: mark nfs_page reqs with flag for extra ref Change the use of PG_INODE_REF - set it when taking extra reference on subrequests and take care to only release once for each request. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/pagelist.c | 4 +++- fs/nfs/write.c | 8 ++++++-- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index b6ee3a6ee96d..7368b2130a41 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -251,8 +251,10 @@ nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev) /* grab extra ref if head request has extra ref from * the write/commit path to handle handoff between write * and commit lists */ - if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags)) + if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags)) { + set_bit(PG_INODE_REF, &req->wb_flags); kref_get(&req->wb_kref); + } } } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 98ff061ccaf3..8e5745a4deff 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -448,7 +448,9 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req) set_page_private(req->wb_page, (unsigned long)req); } nfsi->npages++; - set_bit(PG_INODE_REF, &req->wb_flags); + /* this a head request for a page group - mark it as having an + * extra reference so sub groups can follow suit */ + WARN_ON(test_and_set_bit(PG_INODE_REF, &req->wb_flags)); kref_get(&req->wb_kref); spin_unlock(&inode->i_lock); } @@ -474,7 +476,9 @@ static void nfs_inode_remove_request(struct nfs_page *req) nfsi->npages--; spin_unlock(&inode->i_lock); } - nfs_release_request(req); + + if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) + nfs_release_request(req); } static void From 85710a837c2026aae80b7c64187edf1f10027b0b Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Fri, 11 Jul 2014 10:20:46 -0400 Subject: [PATCH 161/455] nfs: nfs_page should take a ref on the head req nfs_pages that aren't the the head of a group must take a reference on the head as long as ->wb_head is set to it. This stops the head from hitting a refcount of 0 while there is still an active nfs_page for the page group. This avoids kref warnings in the writeback code when the page group head is found and referenced. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/pagelist.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 7368b2130a41..05a63593a61f 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -239,15 +239,21 @@ nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev) WARN_ON_ONCE(prev == req); if (!prev) { + /* a head request */ req->wb_head = req; req->wb_this_page = req; } else { + /* a subrequest */ WARN_ON_ONCE(prev->wb_this_page != prev->wb_head); WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags)); req->wb_head = prev->wb_head; req->wb_this_page = prev->wb_this_page; prev->wb_this_page = req; + /* All subrequests take a ref on the head request until + * nfs_page_group_destroy is called */ + kref_get(&req->wb_head->wb_kref); + /* grab extra ref if head request has extra ref from * the write/commit path to handle handoff between write * and commit lists */ @@ -271,6 +277,10 @@ nfs_page_group_destroy(struct kref *kref) struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref); struct nfs_page *tmp, *next; + /* subrequests must release the ref on the head request */ + if (req->wb_head != req) + nfs_release_request(req->wb_head); + if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN)) return; From 84d3a9a913ba6a90c79b7763d063bb42554a8906 Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Fri, 11 Jul 2014 10:20:47 -0400 Subject: [PATCH 162/455] nfs: change find_request to find_head_request nfs_page_find_request_locked* should find the head request for that page. Rename the functions and add comments to make this clear, and fix a bug that could return a subrequest when page_private isn't set on the page. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/write.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 8e5745a4deff..53c4a9917dac 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -91,8 +91,15 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error) set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags); } +/* + * nfs_page_find_head_request_locked - find head request associated with @page + * + * must be called while holding the inode lock. + * + * returns matching head request with reference held, or NULL if not found. + */ static struct nfs_page * -nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page) +nfs_page_find_head_request_locked(struct nfs_inode *nfsi, struct page *page) { struct nfs_page *req = NULL; @@ -104,25 +111,33 @@ nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page) /* Linearly search the commit list for the correct req */ list_for_each_entry_safe(freq, t, &nfsi->commit_info.list, wb_list) { if (freq->wb_page == page) { - req = freq; + req = freq->wb_head; break; } } } - if (req) + if (req) { + WARN_ON_ONCE(req->wb_head != req); + kref_get(&req->wb_kref); + } return req; } -static struct nfs_page *nfs_page_find_request(struct page *page) +/* + * nfs_page_find_head_request - find head request associated with @page + * + * returns matching head request with reference held, or NULL if not found. + */ +static struct nfs_page *nfs_page_find_head_request(struct page *page) { struct inode *inode = page_file_mapping(page)->host; struct nfs_page *req = NULL; spin_lock(&inode->i_lock); - req = nfs_page_find_request_locked(NFS_I(inode), page); + req = nfs_page_find_head_request_locked(NFS_I(inode), page); spin_unlock(&inode->i_lock); return req; } @@ -282,7 +297,7 @@ static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblo spin_lock(&inode->i_lock); for (;;) { - req = nfs_page_find_request_locked(NFS_I(inode), page); + req = nfs_page_find_head_request_locked(NFS_I(inode), page); if (req == NULL) break; if (nfs_lock_request(req)) @@ -773,7 +788,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode, spin_lock(&inode->i_lock); for (;;) { - req = nfs_page_find_request_locked(NFS_I(inode), page); + req = nfs_page_find_head_request_locked(NFS_I(inode), page); if (req == NULL) goto out_unlock; @@ -881,7 +896,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page) * dropped page. */ do { - req = nfs_page_find_request(page); + req = nfs_page_find_head_request(page); if (req == NULL) return 0; l_ctx = req->wb_lock_context; @@ -1575,7 +1590,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page) for (;;) { wait_on_page_writeback(page); - req = nfs_page_find_request(page); + req = nfs_page_find_head_request(page); if (req == NULL) break; if (nfs_lock_request(req)) { From d458138353726ea6dcbc53ae3597e489d0432c25 Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Fri, 11 Jul 2014 10:20:48 -0400 Subject: [PATCH 163/455] nfs: handle multiple reqs in nfs_page_async_flush Change nfs_find_and_lock_request so nfs_page_async_flush can handle multiple requests in a page. There is only one request for a page the first time nfs_page_async_flush is called, but if a write or commit fails, async_flush is called again and there may be multiple requests associated with the page. The solution is to merge all the requests in a page group into a single request before calling nfs_pageio_add_request. Rename nfs_find_and_lock_request to nfs_lock_and_join_requests and change it to first lock all requests for the page, then cancel and merge all subrequests into the head request. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/internal.h | 1 + fs/nfs/pagelist.c | 4 +- fs/nfs/write.c | 265 +++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 240 insertions(+), 30 deletions(-) diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 82ddbf46660e..f415cbf9f6c3 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -244,6 +244,7 @@ void nfs_pgio_data_release(struct nfs_pgio_data *); int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *); int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_data *, const struct rpc_call_ops *, int, int); +void nfs_free_request(struct nfs_page *req); static inline void nfs_iocounter_init(struct nfs_io_counter *c) { diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 05a63593a61f..0aefc8102b6b 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -29,8 +29,6 @@ static struct kmem_cache *nfs_page_cachep; static const struct rpc_call_ops nfs_pgio_common_ops; -static void nfs_free_request(struct nfs_page *); - static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount) { p->npages = pagecount; @@ -406,7 +404,7 @@ static void nfs_clear_request(struct nfs_page *req) * * Note: Should never be called with the spinlock held! */ -static void nfs_free_request(struct nfs_page *req) +void nfs_free_request(struct nfs_page *req) { WARN_ON_ONCE(req->wb_this_page != req); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 53c4a9917dac..9f4424c464a0 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -46,6 +46,7 @@ static const struct rpc_call_ops nfs_commit_ops; static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops; static const struct nfs_commit_completion_ops nfs_commit_completion_ops; static const struct nfs_rw_ops nfs_rw_write_ops; +static void nfs_clear_request_commit(struct nfs_page *req); static struct kmem_cache *nfs_wdata_cachep; static mempool_t *nfs_wdata_mempool; @@ -289,36 +290,246 @@ static void nfs_end_page_writeback(struct nfs_page *req) clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC); } -static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock) + +/* nfs_page_group_clear_bits + * @req - an nfs request + * clears all page group related bits from @req + */ +static void +nfs_page_group_clear_bits(struct nfs_page *req) { - struct inode *inode = page_file_mapping(page)->host; - struct nfs_page *req; + clear_bit(PG_TEARDOWN, &req->wb_flags); + clear_bit(PG_UNLOCKPAGE, &req->wb_flags); + clear_bit(PG_UPTODATE, &req->wb_flags); + clear_bit(PG_WB_END, &req->wb_flags); + clear_bit(PG_REMOVE, &req->wb_flags); +} + + +/* + * nfs_unroll_locks_and_wait - unlock all newly locked reqs and wait on @req + * + * this is a helper function for nfs_lock_and_join_requests + * + * @inode - inode associated with request page group, must be holding inode lock + * @head - head request of page group, must be holding head lock + * @req - request that couldn't lock and needs to wait on the req bit lock + * @nonblock - if true, don't actually wait + * + * NOTE: this must be called holding page_group bit lock and inode spin lock + * and BOTH will be released before returning. + * + * returns 0 on success, < 0 on error. + */ +static int +nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head, + struct nfs_page *req, bool nonblock) + __releases(&inode->i_lock) +{ + struct nfs_page *tmp; int ret; - spin_lock(&inode->i_lock); - for (;;) { - req = nfs_page_find_head_request_locked(NFS_I(inode), page); - if (req == NULL) - break; - if (nfs_lock_request(req)) - break; - /* Note: If we hold the page lock, as is the case in nfs_writepage, - * then the call to nfs_lock_request() will always - * succeed provided that someone hasn't already marked the - * request as dirty (in which case we don't care). - */ - spin_unlock(&inode->i_lock); - if (!nonblock) - ret = nfs_wait_on_request(req); - else - ret = -EAGAIN; - nfs_release_request(req); - if (ret != 0) - return ERR_PTR(ret); - spin_lock(&inode->i_lock); - } + /* relinquish all the locks successfully grabbed this run */ + for (tmp = head ; tmp != req; tmp = tmp->wb_this_page) + nfs_unlock_request(tmp); + + WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags)); + + /* grab a ref on the request that will be waited on */ + kref_get(&req->wb_kref); + + nfs_page_group_unlock(head); spin_unlock(&inode->i_lock); - return req; + + /* release ref from nfs_page_find_head_request_locked */ + nfs_release_request(head); + + if (!nonblock) + ret = nfs_wait_on_request(req); + else + ret = -EAGAIN; + nfs_release_request(req); + + return ret; +} + +/* + * nfs_destroy_unlinked_subrequests - destroy recently unlinked subrequests + * + * @destroy_list - request list (using wb_this_page) terminated by @old_head + * @old_head - the old head of the list + * + * All subrequests must be locked and removed from all lists, so at this point + * they are only "active" in this function, and possibly in nfs_wait_on_request + * with a reference held by some other context. + */ +static void +nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list, + struct nfs_page *old_head) +{ + while (destroy_list) { + struct nfs_page *subreq = destroy_list; + + destroy_list = (subreq->wb_this_page == old_head) ? + NULL : subreq->wb_this_page; + + WARN_ON_ONCE(old_head != subreq->wb_head); + + /* make sure old group is not used */ + subreq->wb_head = subreq; + subreq->wb_this_page = subreq; + + nfs_clear_request_commit(subreq); + + /* subreq is now totally disconnected from page group or any + * write / commit lists. last chance to wake any waiters */ + nfs_unlock_request(subreq); + + if (!test_bit(PG_TEARDOWN, &subreq->wb_flags)) { + /* release ref on old head request */ + nfs_release_request(old_head); + + nfs_page_group_clear_bits(subreq); + + /* release the PG_INODE_REF reference */ + if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags)) + nfs_release_request(subreq); + else + WARN_ON_ONCE(1); + } else { + WARN_ON_ONCE(test_bit(PG_CLEAN, &subreq->wb_flags)); + /* zombie requests have already released the last + * reference and were waiting on the rest of the + * group to complete. Since it's no longer part of a + * group, simply free the request */ + nfs_page_group_clear_bits(subreq); + nfs_free_request(subreq); + } + } +} + +/* + * nfs_lock_and_join_requests - join all subreqs to the head req and return + * a locked reference, cancelling any pending + * operations for this page. + * + * @page - the page used to lookup the "page group" of nfs_page structures + * @nonblock - if true, don't block waiting for request locks + * + * This function joins all sub requests to the head request by first + * locking all requests in the group, cancelling any pending operations + * and finally updating the head request to cover the whole range covered by + * the (former) group. All subrequests are removed from any write or commit + * lists, unlinked from the group and destroyed. + * + * Returns a locked, referenced pointer to the head request - which after + * this call is guaranteed to be the only request associated with the page. + * Returns NULL if no requests are found for @page, or a ERR_PTR if an + * error was encountered. + */ +static struct nfs_page * +nfs_lock_and_join_requests(struct page *page, bool nonblock) +{ + struct inode *inode = page_file_mapping(page)->host; + struct nfs_page *head, *subreq; + struct nfs_page *destroy_list = NULL; + unsigned int total_bytes; + int ret; + +try_again: + total_bytes = 0; + + WARN_ON_ONCE(destroy_list); + + spin_lock(&inode->i_lock); + + /* + * A reference is taken only on the head request which acts as a + * reference to the whole page group - the group will not be destroyed + * until the head reference is released. + */ + head = nfs_page_find_head_request_locked(NFS_I(inode), page); + + if (!head) { + spin_unlock(&inode->i_lock); + return NULL; + } + + /* lock each request in the page group */ + nfs_page_group_lock(head); + subreq = head; + do { + /* + * Subrequests are always contiguous, non overlapping + * and in order. If not, it's a programming error. + */ + WARN_ON_ONCE(subreq->wb_offset != + (head->wb_offset + total_bytes)); + + /* keep track of how many bytes this group covers */ + total_bytes += subreq->wb_bytes; + + if (!nfs_lock_request(subreq)) { + /* releases page group bit lock and + * inode spin lock and all references */ + ret = nfs_unroll_locks_and_wait(inode, head, + subreq, nonblock); + + if (ret == 0) + goto try_again; + + return ERR_PTR(ret); + } + + subreq = subreq->wb_this_page; + } while (subreq != head); + + /* Now that all requests are locked, make sure they aren't on any list. + * Commit list removal accounting is done after locks are dropped */ + subreq = head; + do { + nfs_list_remove_request(subreq); + subreq = subreq->wb_this_page; + } while (subreq != head); + + /* unlink subrequests from head, destroy them later */ + if (head->wb_this_page != head) { + /* destroy list will be terminated by head */ + destroy_list = head->wb_this_page; + head->wb_this_page = head; + + /* change head request to cover whole range that + * the former page group covered */ + head->wb_bytes = total_bytes; + } + + /* + * prepare head request to be added to new pgio descriptor + */ + nfs_page_group_clear_bits(head); + + /* + * some part of the group was still on the inode list - otherwise + * the group wouldn't be involved in async write. + * grab a reference for the head request, iff it needs one. + */ + if (!test_and_set_bit(PG_INODE_REF, &head->wb_flags)) + kref_get(&head->wb_kref); + + nfs_page_group_unlock(head); + + /* drop lock to clear_request_commit the head req and clean up + * requests on destroy list */ + spin_unlock(&inode->i_lock); + + nfs_destroy_unlinked_subrequests(destroy_list, head); + + /* clean up commit list state */ + nfs_clear_request_commit(head); + + /* still holds ref on head from nfs_page_find_head_request_locked + * and still has lock on head from lock loop */ + return head; } /* @@ -331,7 +542,7 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio, struct nfs_page *req; int ret = 0; - req = nfs_find_and_lock_request(page, nonblock); + req = nfs_lock_and_join_requests(page, nonblock); if (!req) goto out; ret = PTR_ERR(req); From 3e2170451e91327bfa8a82040fea78043847533a Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Fri, 11 Jul 2014 10:20:49 -0400 Subject: [PATCH 164/455] nfs: handle multiple reqs in nfs_wb_page_cancel Use nfs_lock_and_join_requests to merge all subrequests into the head request - this cancels and dereferences all subrequests. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/write.c | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 9f4424c464a0..bdc4db23951e 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1799,27 +1799,28 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page) struct nfs_page *req; int ret = 0; - for (;;) { - wait_on_page_writeback(page); - req = nfs_page_find_head_request(page); - if (req == NULL) - break; - if (nfs_lock_request(req)) { - nfs_clear_request_commit(req); - nfs_inode_remove_request(req); - /* - * In case nfs_inode_remove_request has marked the - * page as being dirty - */ - cancel_dirty_page(page, PAGE_CACHE_SIZE); - nfs_unlock_and_release_request(req); - break; - } - ret = nfs_wait_on_request(req); - nfs_release_request(req); - if (ret < 0) - break; + wait_on_page_writeback(page); + + /* blocking call to cancel all requests and join to a single (head) + * request */ + req = nfs_lock_and_join_requests(page, false); + + if (IS_ERR(req)) { + ret = PTR_ERR(req); + } else if (req) { + /* all requests from this page have been cancelled by + * nfs_lock_and_join_requests, so just remove the head + * request from the inode / page_private pointer and + * release it */ + nfs_inode_remove_request(req); + /* + * In case nfs_inode_remove_request has marked the + * page as being dirty + */ + cancel_dirty_page(page, PAGE_CACHE_SIZE); + nfs_unlock_and_release_request(req); } + return ret; } From aafe37504c70954fc104c88d9d15d553572dae69 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 12 Jul 2014 17:23:39 -0400 Subject: [PATCH 165/455] NFS: Remove 2 unused variables Cc: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/direct.c | 2 -- fs/nfs/write.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 8f98138cbc43..f11b9eed0de1 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -756,7 +756,6 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) spin_unlock(&dreq->lock); while (!list_empty(&hdr->pages)) { - bool do_destroy = true; req = nfs_list_entry(hdr->pages.next); nfs_list_remove_request(req); @@ -765,7 +764,6 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) case NFS_IOHDR_NEED_COMMIT: kref_get(&req->wb_kref); nfs_mark_request_commit(req, hdr->lseg, &cinfo); - do_destroy = false; } nfs_unlock_and_release_request(req); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index bdc4db23951e..5e2f10304548 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -868,7 +868,6 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr) { struct nfs_commit_info cinfo; unsigned long bytes = 0; - bool do_destroy; if (test_bit(NFS_IOHDR_REDO, &hdr->flags)) goto out; @@ -898,7 +897,6 @@ remove_req: next: nfs_unlock_request(req); nfs_end_page_writeback(req); - do_destroy = !test_bit(NFS_IOHDR_NEED_COMMIT, &hdr->flags); nfs_release_request(req); } out: From 46c1376db1b85ae412a7917cec148c6e60f79428 Mon Sep 17 00:00:00 2001 From: Steve Wise Date: Fri, 20 Jun 2014 14:26:25 -0500 Subject: [PATCH 166/455] RDMA/cxgb4: Call iwpm_init() only once We need to only register with the iwpm core once. Currently it is being done for every adapter, which causes a failure for each adapter but the first, making multiple adapters unusable. Fixes: 9eccfe109b27 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service") Signed-off-by: Steve Wise Signed-off-by: Roland Dreier --- drivers/infiniband/hw/cxgb4/cm.c | 2 +- drivers/infiniband/hw/cxgb4/device.c | 17 ++++++++++------- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 +- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index 6a93280250d1..768a0fb67dd6 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -3925,7 +3925,7 @@ int __init c4iw_cm_init(void) return 0; } -void __exit c4iw_cm_term(void) +void c4iw_cm_term(void) { WARN_ON(!list_empty(&timeout_list)); flush_workqueue(workq); diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c index 16b75de4b2de..7db82b24302b 100644 --- a/drivers/infiniband/hw/cxgb4/device.c +++ b/drivers/infiniband/hw/cxgb4/device.c @@ -730,7 +730,6 @@ static void c4iw_dealloc(struct uld_ctx *ctx) if (ctx->dev->rdev.oc_mw_kva) iounmap(ctx->dev->rdev.oc_mw_kva); ib_dealloc_device(&ctx->dev->ibdev); - iwpm_exit(RDMA_NL_C4IW); ctx->dev = NULL; } @@ -827,12 +826,6 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop) setup_debugfs(devp); } - ret = iwpm_init(RDMA_NL_C4IW); - if (ret) { - pr_err("port mapper initialization failed with %d\n", ret); - ib_dealloc_device(&devp->ibdev); - return ERR_PTR(ret); - } return devp; } @@ -1333,6 +1326,15 @@ static int __init c4iw_init_module(void) pr_err("%s[%u]: Failed to add netlink callback\n" , __func__, __LINE__); + err = iwpm_init(RDMA_NL_C4IW); + if (err) { + pr_err("port mapper initialization failed with %d\n", err); + ibnl_remove_client(RDMA_NL_C4IW); + c4iw_cm_term(); + debugfs_remove_recursive(c4iw_debugfs_root); + return err; + } + cxgb4_register_uld(CXGB4_ULD_RDMA, &c4iw_uld_info); return 0; @@ -1350,6 +1352,7 @@ static void __exit c4iw_exit_module(void) } mutex_unlock(&dev_mutex); cxgb4_unregister_uld(CXGB4_ULD_RDMA); + iwpm_exit(RDMA_NL_C4IW); ibnl_remove_client(RDMA_NL_C4IW); c4iw_cm_term(); debugfs_remove_recursive(c4iw_debugfs_root); diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index 125bc5d1e175..361fff7a0742 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -908,7 +908,7 @@ int c4iw_destroy_ctrl_qp(struct c4iw_rdev *rdev); int c4iw_register_device(struct c4iw_dev *dev); void c4iw_unregister_device(struct c4iw_dev *dev); int __init c4iw_cm_init(void); -void __exit c4iw_cm_term(void); +void c4iw_cm_term(void); void c4iw_release_dev_ucontext(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx); void c4iw_init_dev_ucontext(struct c4iw_rdev *rdev, From 655fc39bf40331e13503bed85c9ed0278bc35575 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Wed, 9 Jul 2014 21:04:00 +0200 Subject: [PATCH 167/455] firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA Commit b3d681a4fc108f9653bbb44e4f4e72db2b8a5734 ("firewire: Use COMPILE_TEST for build testing") added COMPILE_TEST as an alternative dependency for the purpose of build testing the firewire core. However, this bypasses all other implicit dependencies assumed by PCI, like HAS_DMA. If NO_DMA=y: drivers/built-in.o: In function `fw_iso_buffer_destroy': (.text+0x36a096): undefined reference to `dma_unmap_page' drivers/built-in.o: In function `fw_iso_buffer_map_dma': (.text+0x36a164): undefined reference to `dma_map_page' drivers/built-in.o: In function `fw_iso_buffer_map_dma': (.text+0x36a172): undefined reference to `dma_mapping_error' drivers/built-in.o: In function `sbp2_send_management_orb': sbp2.c:(.text+0x36c6b4): undefined reference to `dma_map_single' sbp2.c:(.text+0x36c6c8): undefined reference to `dma_mapping_error' sbp2.c:(.text+0x36c772): undefined reference to `dma_map_single' sbp2.c:(.text+0x36c786): undefined reference to `dma_mapping_error' sbp2.c:(.text+0x36c854): undefined reference to `dma_unmap_single' sbp2.c:(.text+0x36c872): undefined reference to `dma_unmap_single' drivers/built-in.o: In function `sbp2_map_scatterlist': sbp2.c:(.text+0x36ccbc): undefined reference to `scsi_dma_map' sbp2.c:(.text+0x36cd36): undefined reference to `dma_map_single' sbp2.c:(.text+0x36cd4e): undefined reference to `dma_mapping_error' sbp2.c:(.text+0x36cd84): undefined reference to `scsi_dma_unmap' drivers/built-in.o: In function `sbp2_unmap_scatterlist': sbp2.c:(.text+0x36cda6): undefined reference to `scsi_dma_unmap' sbp2.c:(.text+0x36cdc6): undefined reference to `dma_unmap_single' drivers/built-in.o: In function `complete_command_orb': sbp2.c:(.text+0x36d6ac): undefined reference to `dma_unmap_single' drivers/built-in.o: In function `sbp2_scsi_queuecommand': sbp2.c:(.text+0x36d8e0): undefined reference to `dma_map_single' sbp2.c:(.text+0x36d8f6): undefined reference to `dma_mapping_error' Add an explicit dependency on HAS_DMA to fix this. Signed-off-by: Geert Uytterhoeven Reviewed-by: Jean Delvare Signed-off-by: Stefan Richter --- drivers/firewire/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/firewire/Kconfig b/drivers/firewire/Kconfig index 4199849e3758..145974f9662b 100644 --- a/drivers/firewire/Kconfig +++ b/drivers/firewire/Kconfig @@ -1,4 +1,5 @@ menu "IEEE 1394 (FireWire) support" + depends on HAS_DMA depends on PCI || COMPILE_TEST # firewire-core does not depend on PCI but is # not useful without PCI controller driver From f563b89b182594f827b4100bd34f916339785a77 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sun, 13 Jul 2014 15:13:19 -0400 Subject: [PATCH 168/455] NFS: Don't reset pg_moreio in __nfs_pageio_add_request Once we've started sending unstable NFS writes, we do not want to clear pg_moreio, or we may end up sending the very last request as a stable write if the commit lists are still empty. Do, however, reset pg_moreio in the case where we end up having to recoalesce the write if an attempt to use pNFS failed. Signed-off-by: Trond Myklebust --- fs/nfs/pagelist.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 0aefc8102b6b..17fab89f6358 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -935,7 +935,6 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc, nfs_pageio_doio(desc); if (desc->pg_error < 0) return 0; - desc->pg_moreio = 0; if (desc->pg_recoalesce) return 0; /* retry add_request for this subreq */ @@ -982,6 +981,7 @@ static int nfs_do_recoalesce(struct nfs_pageio_descriptor *desc) desc->pg_count = 0; desc->pg_base = 0; desc->pg_recoalesce = 0; + desc->pg_moreio = 0; while (!list_empty(&head)) { struct nfs_page *req; From 54c39e9ba3a93e3848ad8f9d082c39010cfc5e73 Mon Sep 17 00:00:00 2001 From: Thomas Petazzoni Date: Wed, 2 Jul 2014 15:16:32 +0200 Subject: [PATCH 169/455] mtd: nand: reduce the warning noise when the ECC is too weak In commit 67a9ad9b8a6f ("mtd: nand: Warn the user if the selected ECC strength is too weak"), a check was added to inform the user when the ECC used for a NAND device is weaker than the recommended ECC advertised by the NAND chip. However, the warning uses WARN_ON(), which has two undesirable side-effects: - It just prints to the kernel log the fact that there is a warning in this file, at this line, but it doesn't explain anything about the warning itself. - It dumps a stack trace which is very noisy, for something that the user is most likely not able to fix. If a certain ECC used by the kernel is weaker than the advertised one, it's most likely to make sure the kernel uses an ECC that is compatible with the one used by the bootloader, and changing the bootloader may not necessarily be easy. Therefore, normal users would not be able to do anything to fix this very noisy warning, and will have to suffer from it at every kernel boot. At least every time I see this stack trace in my kernel boot log, I wonder what new thing is broken, just to realize that it's once again this NAND ECC warning. Therefore, this commit turns: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at /home/thomas/projets/linux-2.6/drivers/mtd/nand/nand_base.c:4051 nand_scan_tail+0x538/0x780() Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-rc3-dirty #4 [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (warn_slowpath_common+0x6c/0x8c) [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24) [] (warn_slowpath_null) from [] (nand_scan_tail+0x538/0x780) [] (nand_scan_tail) from [] (orion_nand_probe+0x224/0x2e4) [] (orion_nand_probe) from [] (platform_drv_probe+0x18/0x4c) [] (platform_drv_probe) from [] (really_probe+0x80/0x218) [] (really_probe) from [] (__driver_attach+0x98/0x9c) [] (__driver_attach) from [] (bus_for_each_dev+0x64/0x94) [] (bus_for_each_dev) from [] (bus_add_driver+0x144/0x1ec) [] (bus_add_driver) from [] (driver_register+0x78/0xf8) [] (driver_register) from [] (platform_driver_probe+0x20/0xb8) [] (platform_driver_probe) from [] (do_one_initcall+0x80/0x1d8) [] (do_one_initcall) from [] (kernel_init_freeable+0xf4/0x1b4) [] (kernel_init_freeable) from [] (kernel_init+0x8/0xec) [] (kernel_init) from [] (ret_from_fork+0x14/0x24) ---[ end trace 62f87d875aceccb4 ]--- Into the much shorter, and much more useful: nand: WARNING: MT29F2G08ABAEAWP: the ECC used on your system is too weak compared to the one required by the NAND chip Signed-off-by: Thomas Petazzoni Signed-off-by: Brian Norris --- drivers/mtd/nand/nand_base.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 41167e9e991e..4f3e80c68a26 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -4047,8 +4047,10 @@ int nand_scan_tail(struct mtd_info *mtd) ecc->layout->oobavail += ecc->layout->oobfree[i].length; mtd->oobavail = ecc->layout->oobavail; - /* ECC sanity check: warn noisily if it's too weak */ - WARN_ON(!nand_ecc_strength_good(mtd)); + /* ECC sanity check: warn if it's too weak */ + if (!nand_ecc_strength_good(mtd)) + pr_warn("WARNING: %s: the ECC used on your system is too weak compared to the one required by the NAND chip\n", + mtd->name); /* * Set the number of read / write steps for one page depending on ECC From 812c5fa82bae9f377f49000d7636c19a8b61735c Mon Sep 17 00:00:00 2001 From: Andrea Adami Date: Mon, 2 Jun 2014 23:38:35 +0200 Subject: [PATCH 170/455] mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR This family of chips was long ago supported by the pre-cfi driver. CFI code tested on several Zaurus SL-5500 (Collie) 2x16 on 32 bit bus. Function is_LH28F640BF() mimics is_m29ew() from cmdset_0002.c Buffer write fixes as seen in 2007 patch c/o Anti Sullin artecdesign.ee> http://comments.gmane.org/gmane.linux.ports.arm.kernel/36733 [Brian: this patch is semi-urgent, because the following patch switches to using CFI detection for a chip which (until now) is unsupported by the CFI driver 9218310 ARM: 8084/1: sa1100: collie: revert back to cfi_probe ] Signed-off-by: Andrea Adami Signed-off-by: Brian Norris --- drivers/mtd/chips/cfi_cmdset_0001.c | 43 +++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c index e4ec355704a6..a7543ba3e190 100644 --- a/drivers/mtd/chips/cfi_cmdset_0001.c +++ b/drivers/mtd/chips/cfi_cmdset_0001.c @@ -52,6 +52,11 @@ /* Atmel chips */ #define AT49BV640D 0x02de #define AT49BV640DT 0x02db +/* Sharp chips */ +#define LH28F640BFHE_PTTL90 0x00b0 +#define LH28F640BFHE_PBTL90 0x00b1 +#define LH28F640BFHE_PTTL70A 0x00b2 +#define LH28F640BFHE_PBTL70A 0x00b3 static int cfi_intelext_read (struct mtd_info *, loff_t, size_t, size_t *, u_char *); static int cfi_intelext_write_words(struct mtd_info *, loff_t, size_t, size_t *, const u_char *); @@ -258,6 +263,36 @@ static void fixup_st_m28w320cb(struct mtd_info *mtd) (cfi->cfiq->EraseRegionInfo[1] & 0xffff0000) | 0x3e; }; +static int is_LH28F640BF(struct cfi_private *cfi) +{ + /* Sharp LH28F640BF Family */ + if (cfi->mfr == CFI_MFR_SHARP && ( + cfi->id == LH28F640BFHE_PTTL90 || cfi->id == LH28F640BFHE_PBTL90 || + cfi->id == LH28F640BFHE_PTTL70A || cfi->id == LH28F640BFHE_PBTL70A)) + return 1; + return 0; +} + +static void fixup_LH28F640BF(struct mtd_info *mtd) +{ + struct map_info *map = mtd->priv; + struct cfi_private *cfi = map->fldrv_priv; + struct cfi_pri_intelext *extp = cfi->cmdset_priv; + + /* Reset the Partition Configuration Register on LH28F640BF + * to a single partition (PCR = 0x000): PCR is embedded into A0-A15. */ + if (is_LH28F640BF(cfi)) { + printk(KERN_INFO "Reset Partition Config. Register: 1 Partition of 4 planes\n"); + map_write(map, CMD(0x60), 0); + map_write(map, CMD(0x04), 0); + + /* We have set one single partition thus + * Simultaneous Operations are not allowed */ + printk(KERN_INFO "cfi_cmdset_0001: Simultaneous Operations disabled\n"); + extp->FeatureSupport &= ~512; + } +} + static void fixup_use_point(struct mtd_info *mtd) { struct map_info *map = mtd->priv; @@ -309,6 +344,8 @@ static struct cfi_fixup cfi_fixup_table[] = { { CFI_MFR_ST, 0x00ba, /* M28W320CT */ fixup_st_m28w320ct }, { CFI_MFR_ST, 0x00bb, /* M28W320CB */ fixup_st_m28w320cb }, { CFI_MFR_INTEL, CFI_ID_ANY, fixup_unlock_powerup_lock }, + { CFI_MFR_SHARP, CFI_ID_ANY, fixup_unlock_powerup_lock }, + { CFI_MFR_SHARP, CFI_ID_ANY, fixup_LH28F640BF }, { 0, 0, NULL } }; @@ -1649,6 +1686,12 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip, initial_adr = adr; cmd_adr = adr & ~(wbufsize-1); + /* Sharp LH28F640BF chips need the first address for the + * Page Buffer Program command. See Table 5 of + * LH28F320BF, LH28F640BF, LH28F128BF Series (Appendix FUM00701) */ + if (is_LH28F640BF(cfi)) + cmd_adr = adr; + /* Let's determine this according to the interleave only once */ write_cmd = (cfi->cfiq->P_ID != P_ID_INTEL_PERFORMANCE) ? CMD(0xe8) : CMD(0xe9); From 5a680fad352525c3461ed1f7e3269d31a9128a07 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Fri, 11 Jul 2014 16:55:15 -0700 Subject: [PATCH 171/455] net: bcmgenet: fix RGMII_MODE_EN bit RGMII_MODE_EN bit was defined to 0, while it is actually 6. It was not much of a problem on older designs where this was a no-op, and the RGMII data-path would always be enabled, but newer GENET controllers need to explicitely enable their RGMII data-pad using this bit. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/genet/bcmgenet.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h index 0f117105fed1..e23c993b1362 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h @@ -331,9 +331,9 @@ struct bcmgenet_mib_counters { #define EXT_ENERGY_DET_MASK (1 << 12) #define EXT_RGMII_OOB_CTRL 0x0C -#define RGMII_MODE_EN (1 << 0) #define RGMII_LINK (1 << 4) #define OOB_DISABLE (1 << 5) +#define RGMII_MODE_EN (1 << 6) #define ID_MODE_DIS (1 << 16) #define EXT_GPHY_CTRL 0x1C From 845d6fcc2748855183295e2e6583468d0d0a911e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?fran=C3=A7ois=20romieu?= Date: Sat, 12 Jul 2014 02:54:27 +0200 Subject: [PATCH 172/455] MAINTAINERS: update r8169 maintainer Signed-off-by: Francois Romieu Signed-off-by: David S. Miller --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 23863ce06996..afc1c5aa476f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -156,7 +156,6 @@ F: drivers/net/hamradio/6pack.c 8169 10/100/1000 GIGABIT ETHERNET DRIVER M: Realtek linux nic maintainers -M: Francois Romieu L: netdev@vger.kernel.org S: Maintained F: drivers/net/ethernet/realtek/r8169.c From 6b89cddee051945a83cc67436ad1680ba7d9f766 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Wed, 9 Jul 2014 22:35:53 +0200 Subject: [PATCH 173/455] Revert "drm/i915: Don't set the 8to6 dither flag when not scaling" This reverts commit 773875bfb6737982903c42d1ee88cf60af80089c. It is very much needed and the lack of dithering has been reported by a large list of people with various gen2/3 hardware. Also, the original patch was complete non-sense since the WARNING backtraces in the references bugzilla are about gmch_pfit.lvds_border_bits mismatch, not at all about the dither bit. That one seems to work. Cc: Jiri Kosina Cc: Pavel Machek Cc: Hans de Bruin Cc: stable@vger.kernel.org Acked-by: Pavel Machek Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/intel_lvds.c | 7 +++++++ drivers/gpu/drm/i915/intel_panel.c | 8 ++++---- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c index 23126023aeba..5e5a72fca5fb 100644 --- a/drivers/gpu/drm/i915/intel_lvds.c +++ b/drivers/gpu/drm/i915/intel_lvds.c @@ -111,6 +111,13 @@ static void intel_lvds_get_config(struct intel_encoder *encoder, pipe_config->adjusted_mode.flags |= flags; + /* gen2/3 store dither state in pfit control, needs to match */ + if (INTEL_INFO(dev)->gen < 4) { + tmp = I915_READ(PFIT_CONTROL); + + pipe_config->gmch_pfit.control |= tmp & PANEL_8TO6_DITHER_ENABLE; + } + dotclock = pipe_config->port_clock; if (HAS_PCH_SPLIT(dev_priv->dev)) diff --git a/drivers/gpu/drm/i915/intel_panel.c b/drivers/gpu/drm/i915/intel_panel.c index 628cd8938274..12b02fe1d0ae 100644 --- a/drivers/gpu/drm/i915/intel_panel.c +++ b/drivers/gpu/drm/i915/intel_panel.c @@ -361,16 +361,16 @@ void intel_gmch_panel_fitting(struct intel_crtc *intel_crtc, pfit_control |= ((intel_crtc->pipe << PFIT_PIPE_SHIFT) | PFIT_FILTER_FUZZY); - /* Make sure pre-965 set dither correctly for 18bpp panels. */ - if (INTEL_INFO(dev)->gen < 4 && pipe_config->pipe_bpp == 18) - pfit_control |= PANEL_8TO6_DITHER_ENABLE; - out: if ((pfit_control & PFIT_ENABLE) == 0) { pfit_control = 0; pfit_pgm_ratios = 0; } + /* Make sure pre-965 set dither correctly for 18bpp panels. */ + if (INTEL_INFO(dev)->gen < 4 && pipe_config->pipe_bpp == 18) + pfit_control |= PANEL_8TO6_DITHER_ENABLE; + pipe_config->gmch_pfit.control = pfit_control; pipe_config->gmch_pfit.pgm_ratios = pfit_pgm_ratios; pipe_config->gmch_pfit.lvds_border_bits = border; From 724cb06fa9b1e1ffd98188275543fdb3b8eaca4f Mon Sep 17 00:00:00 2001 From: Scot Doyle Date: Fri, 11 Jul 2014 22:16:30 +0000 Subject: [PATCH 174/455] drm/i915: Ignore VBT backlight presence check on HP Chromebook 14 commit c675949ec58ca50d5a3ae3c757892f1560f6e896 drm/i915: do not setup backlight if not available according to VBT caused a regression on the HP Chromebook 14 (with Celeron 2955U CPU), which has a misconfigured VBT. Apply quirk to ignore the VBT backlight presence check during backlight setup. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79813 Signed-off-by: Scot Doyle Tested-by: Stefan Nagy Cc: Jani Nikula Cc: Daniel Vetter Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index e27e7804c0b9..82e7d57f0a8a 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11673,6 +11673,9 @@ static struct intel_quirk intel_quirks[] = { /* Toshiba CB35 Chromebook (Celeron 2955U) */ { 0x0a06, 0x1179, 0x0a88, quirk_backlight_present }, + + /* HP Chromebook 14 (Celeron 2955U) */ + { 0x0a06, 0x103c, 0x21ed, quirk_backlight_present }, }; static void intel_init_quirks(struct drm_device *dev) From cd50065b3be83a705635550c04e368f2a4cc44d0 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 14 Jul 2014 10:45:31 +0200 Subject: [PATCH 175/455] ALSA: hda - Revert stream assignment order for Intel controllers We got a regression report for 3.15.x kernels, and this turned out to be triggered by the fix for stream assignment order. On reporter's machine with Intel controller (8086:1e20) + VIA VT1802 codec, the first playback slot can't work with speaker outputs. But the original commit was actually a fix for AMD controllers where no proper GCAP value is returned, we shouldn't revert the whole commit. Instead, in this patch, a new flag is introduced to determine the stream assignment order, and follow the old behavior for Intel controllers. Fixes: dcb32ecd9a53 ('ALSA: hda - Do not assign streams in reverse order') Reported-and-tested-by: Steven Newbury Cc: [v3.15+] Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_controller.c | 3 ++- sound/pci/hda/hda_intel.c | 2 +- sound/pci/hda/hda_priv.h | 1 + 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c index 480bbddbd801..6df04d91c93c 100644 --- a/sound/pci/hda/hda_controller.c +++ b/sound/pci/hda/hda_controller.c @@ -193,7 +193,8 @@ azx_assign_device(struct azx *chip, struct snd_pcm_substream *substream) dsp_unlock(azx_dev); return azx_dev; } - if (!res) + if (!res || + (chip->driver_caps & AZX_DCAPS_REVERSE_ASSIGN)) res = azx_dev; } dsp_unlock(azx_dev); diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index b6b4e71a0b0b..d690c26a197c 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -227,7 +227,7 @@ enum { /* quirks for Intel PCH */ #define AZX_DCAPS_INTEL_PCH_NOPM \ (AZX_DCAPS_SCH_SNOOP | AZX_DCAPS_BUFSIZE | \ - AZX_DCAPS_COUNT_LPIB_DELAY) + AZX_DCAPS_COUNT_LPIB_DELAY | AZX_DCAPS_REVERSE_ASSIGN) #define AZX_DCAPS_INTEL_PCH \ (AZX_DCAPS_INTEL_PCH_NOPM | AZX_DCAPS_PM_RUNTIME) diff --git a/sound/pci/hda/hda_priv.h b/sound/pci/hda/hda_priv.h index 4a7cb01fa912..e9d1a5762a55 100644 --- a/sound/pci/hda/hda_priv.h +++ b/sound/pci/hda/hda_priv.h @@ -186,6 +186,7 @@ enum { SDI0, SDI1, SDI2, SDI3, SDO0, SDO1, SDO2, SDO3 }; #define AZX_DCAPS_BUFSIZE (1 << 21) /* no buffer size alignment */ #define AZX_DCAPS_ALIGN_BUFSIZE (1 << 22) /* buffer size alignment */ #define AZX_DCAPS_4K_BDLE_BOUNDARY (1 << 23) /* BDLE in 4k boundary */ +#define AZX_DCAPS_REVERSE_ASSIGN (1 << 24) /* Assign devices in reverse order */ #define AZX_DCAPS_COUNT_LPIB_DELAY (1 << 25) /* Take LPIB as delay */ #define AZX_DCAPS_PM_RUNTIME (1 << 26) /* runtime PM support */ #define AZX_DCAPS_I915_POWERWELL (1 << 27) /* HSW i915 powerwell support */ From e688a7f8c6cb7a18aae7e55ccdd175f0ad9e69c0 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 1 Jul 2014 11:49:18 +0200 Subject: [PATCH 176/455] netfilter: nf_tables: safe RCU iteration on list when dumping The dump operation through netlink is not protected by the nfnl_lock. Thus, a reader process can be dumping any of the existing object lists while another process can be updating the list content. This patch resolves this situation by protecting all the object lists with RCU in the netlink dump path which is the reader side. The updater path is already protected via nfnl_lock, so use list manipulation RCU-safe operations. Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 94 ++++++++++++++++++++--------------- 1 file changed, 53 insertions(+), 41 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index da5dc37a7402..a27a7c56e7c3 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -35,7 +35,7 @@ int nft_register_afinfo(struct net *net, struct nft_af_info *afi) { INIT_LIST_HEAD(&afi->tables); nfnl_lock(NFNL_SUBSYS_NFTABLES); - list_add_tail(&afi->list, &net->nft.af_info); + list_add_tail_rcu(&afi->list, &net->nft.af_info); nfnl_unlock(NFNL_SUBSYS_NFTABLES); return 0; } @@ -51,7 +51,7 @@ EXPORT_SYMBOL_GPL(nft_register_afinfo); void nft_unregister_afinfo(struct nft_af_info *afi) { nfnl_lock(NFNL_SUBSYS_NFTABLES); - list_del(&afi->list); + list_del_rcu(&afi->list); nfnl_unlock(NFNL_SUBSYS_NFTABLES); } EXPORT_SYMBOL_GPL(nft_unregister_afinfo); @@ -277,11 +277,12 @@ static int nf_tables_dump_tables(struct sk_buff *skb, struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; - list_for_each_entry(afi, &net->nft.af_info, list) { + rcu_read_lock(); + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (family != NFPROTO_UNSPEC && family != afi->family) continue; - list_for_each_entry(table, &afi->tables, list) { + list_for_each_entry_rcu(table, &afi->tables, list) { if (idx < s_idx) goto cont; if (idx > s_idx) @@ -299,6 +300,7 @@ cont: } } done: + rcu_read_unlock(); cb->args[0] = idx; return skb->len; } @@ -517,7 +519,7 @@ static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, module_put(afi->owner); return err; } - list_add_tail(&table->list, &afi->tables); + list_add_tail_rcu(&table->list, &afi->tables); return 0; } @@ -549,7 +551,7 @@ static int nf_tables_deltable(struct sock *nlsk, struct sk_buff *skb, if (err < 0) return err; - list_del(&table->list); + list_del_rcu(&table->list); return 0; } @@ -764,12 +766,13 @@ static int nf_tables_dump_chains(struct sk_buff *skb, struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; - list_for_each_entry(afi, &net->nft.af_info, list) { + rcu_read_lock(); + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (family != NFPROTO_UNSPEC && family != afi->family) continue; - list_for_each_entry(table, &afi->tables, list) { - list_for_each_entry(chain, &table->chains, list) { + list_for_each_entry_rcu(table, &afi->tables, list) { + list_for_each_entry_rcu(chain, &table->chains, list) { if (idx < s_idx) goto cont; if (idx > s_idx) @@ -787,11 +790,11 @@ cont: } } done: + rcu_read_unlock(); cb->args[0] = idx; return skb->len; } - static int nf_tables_getchain(struct sock *nlsk, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const nla[]) @@ -1133,7 +1136,7 @@ static int nf_tables_newchain(struct sock *nlsk, struct sk_buff *skb, goto err2; table->use++; - list_add_tail(&chain->list, &table->chains); + list_add_tail_rcu(&chain->list, &table->chains); return 0; err2: if (!(table->flags & NFT_TABLE_F_DORMANT) && @@ -1183,7 +1186,7 @@ static int nf_tables_delchain(struct sock *nlsk, struct sk_buff *skb, return err; table->use--; - list_del(&chain->list); + list_del_rcu(&chain->list); return 0; } @@ -1202,9 +1205,9 @@ int nft_register_expr(struct nft_expr_type *type) { nfnl_lock(NFNL_SUBSYS_NFTABLES); if (type->family == NFPROTO_UNSPEC) - list_add_tail(&type->list, &nf_tables_expressions); + list_add_tail_rcu(&type->list, &nf_tables_expressions); else - list_add(&type->list, &nf_tables_expressions); + list_add_rcu(&type->list, &nf_tables_expressions); nfnl_unlock(NFNL_SUBSYS_NFTABLES); return 0; } @@ -1219,7 +1222,7 @@ EXPORT_SYMBOL_GPL(nft_register_expr); void nft_unregister_expr(struct nft_expr_type *type) { nfnl_lock(NFNL_SUBSYS_NFTABLES); - list_del(&type->list); + list_del_rcu(&type->list); nfnl_unlock(NFNL_SUBSYS_NFTABLES); } EXPORT_SYMBOL_GPL(nft_unregister_expr); @@ -1555,13 +1558,14 @@ static int nf_tables_dump_rules(struct sk_buff *skb, u8 genctr = ACCESS_ONCE(net->nft.genctr); u8 gencursor = ACCESS_ONCE(net->nft.gencursor); - list_for_each_entry(afi, &net->nft.af_info, list) { + rcu_read_lock(); + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (family != NFPROTO_UNSPEC && family != afi->family) continue; - list_for_each_entry(table, &afi->tables, list) { - list_for_each_entry(chain, &table->chains, list) { - list_for_each_entry(rule, &chain->rules, list) { + list_for_each_entry_rcu(table, &afi->tables, list) { + list_for_each_entry_rcu(chain, &table->chains, list) { + list_for_each_entry_rcu(rule, &chain->rules, list) { if (!nft_rule_is_active(net, rule)) goto cont; if (idx < s_idx) @@ -1582,6 +1586,8 @@ cont: } } done: + rcu_read_unlock(); + /* Invalidate this dump, a transition to the new generation happened */ if (gencursor != net->nft.gencursor || genctr != net->nft.genctr) return -EBUSY; @@ -1935,7 +1941,7 @@ static LIST_HEAD(nf_tables_set_ops); int nft_register_set(struct nft_set_ops *ops) { nfnl_lock(NFNL_SUBSYS_NFTABLES); - list_add_tail(&ops->list, &nf_tables_set_ops); + list_add_tail_rcu(&ops->list, &nf_tables_set_ops); nfnl_unlock(NFNL_SUBSYS_NFTABLES); return 0; } @@ -1944,7 +1950,7 @@ EXPORT_SYMBOL_GPL(nft_register_set); void nft_unregister_set(struct nft_set_ops *ops) { nfnl_lock(NFNL_SUBSYS_NFTABLES); - list_del(&ops->list); + list_del_rcu(&ops->list); nfnl_unlock(NFNL_SUBSYS_NFTABLES); } EXPORT_SYMBOL_GPL(nft_unregister_set); @@ -2237,7 +2243,8 @@ static int nf_tables_dump_sets_table(struct nft_ctx *ctx, struct sk_buff *skb, if (cb->args[1]) return skb->len; - list_for_each_entry(set, &ctx->table->sets, list) { + rcu_read_lock(); + list_for_each_entry_rcu(set, &ctx->table->sets, list) { if (idx < s_idx) goto cont; if (nf_tables_fill_set(skb, ctx, set, NFT_MSG_NEWSET, @@ -2250,6 +2257,7 @@ cont: } cb->args[1] = 1; done: + rcu_read_unlock(); return skb->len; } @@ -2263,7 +2271,8 @@ static int nf_tables_dump_sets_family(struct nft_ctx *ctx, struct sk_buff *skb, if (cb->args[1]) return skb->len; - list_for_each_entry(table, &ctx->afi->tables, list) { + rcu_read_lock(); + list_for_each_entry_rcu(table, &ctx->afi->tables, list) { if (cur_table) { if (cur_table != table) continue; @@ -2272,7 +2281,7 @@ static int nf_tables_dump_sets_family(struct nft_ctx *ctx, struct sk_buff *skb, } ctx->table = table; idx = 0; - list_for_each_entry(set, &ctx->table->sets, list) { + list_for_each_entry_rcu(set, &ctx->table->sets, list) { if (idx < s_idx) goto cont; if (nf_tables_fill_set(skb, ctx, set, NFT_MSG_NEWSET, @@ -2287,6 +2296,7 @@ cont: } cb->args[1] = 1; done: + rcu_read_unlock(); return skb->len; } @@ -2303,7 +2313,8 @@ static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, if (cb->args[1]) return skb->len; - list_for_each_entry(afi, &net->nft.af_info, list) { + rcu_read_lock(); + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (cur_family) { if (afi->family != cur_family) continue; @@ -2311,7 +2322,7 @@ static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, cur_family = 0; } - list_for_each_entry(table, &afi->tables, list) { + list_for_each_entry_rcu(table, &afi->tables, list) { if (cur_table) { if (cur_table != table) continue; @@ -2322,7 +2333,7 @@ static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, ctx->table = table; ctx->afi = afi; idx = 0; - list_for_each_entry(set, &ctx->table->sets, list) { + list_for_each_entry_rcu(set, &ctx->table->sets, list) { if (idx < s_idx) goto cont; if (nf_tables_fill_set(skb, ctx, set, @@ -2342,6 +2353,7 @@ cont: } cb->args[1] = 1; done: + rcu_read_unlock(); return skb->len; } @@ -2600,7 +2612,7 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb, if (err < 0) goto err2; - list_add_tail(&set->list, &table->sets); + list_add_tail_rcu(&set->list, &table->sets); table->use++; return 0; @@ -2620,7 +2632,7 @@ static void nft_set_destroy(struct nft_set *set) static void nf_tables_set_destroy(const struct nft_ctx *ctx, struct nft_set *set) { - list_del(&set->list); + list_del_rcu(&set->list); nf_tables_set_notify(ctx, set, NFT_MSG_DELSET, GFP_ATOMIC); nft_set_destroy(set); } @@ -2655,7 +2667,7 @@ static int nf_tables_delset(struct sock *nlsk, struct sk_buff *skb, if (err < 0) return err; - list_del(&set->list); + list_del_rcu(&set->list); ctx.table->use--; return 0; } @@ -2707,14 +2719,14 @@ int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set, } bind: binding->chain = ctx->chain; - list_add_tail(&binding->list, &set->bindings); + list_add_tail_rcu(&binding->list, &set->bindings); return 0; } void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set, struct nft_set_binding *binding) { - list_del(&binding->list); + list_del_rcu(&binding->list); if (list_empty(&set->bindings) && set->flags & NFT_SET_ANONYMOUS && !(set->flags & NFT_SET_INACTIVE)) @@ -3494,12 +3506,12 @@ static int nf_tables_abort(struct sk_buff *skb) } nft_trans_destroy(trans); } else { - list_del(&trans->ctx.table->list); + list_del_rcu(&trans->ctx.table->list); } break; case NFT_MSG_DELTABLE: - list_add_tail(&trans->ctx.table->list, - &trans->ctx.afi->tables); + list_add_tail_rcu(&trans->ctx.table->list, + &trans->ctx.afi->tables); nft_trans_destroy(trans); break; case NFT_MSG_NEWCHAIN: @@ -3510,7 +3522,7 @@ static int nf_tables_abort(struct sk_buff *skb) nft_trans_destroy(trans); } else { trans->ctx.table->use--; - list_del(&trans->ctx.chain->list); + list_del_rcu(&trans->ctx.chain->list); if (!(trans->ctx.table->flags & NFT_TABLE_F_DORMANT) && trans->ctx.chain->flags & NFT_BASE_CHAIN) { nf_unregister_hooks(nft_base_chain(trans->ctx.chain)->ops, @@ -3520,8 +3532,8 @@ static int nf_tables_abort(struct sk_buff *skb) break; case NFT_MSG_DELCHAIN: trans->ctx.table->use++; - list_add_tail(&trans->ctx.chain->list, - &trans->ctx.table->chains); + list_add_tail_rcu(&trans->ctx.chain->list, + &trans->ctx.table->chains); nft_trans_destroy(trans); break; case NFT_MSG_NEWRULE: @@ -3535,12 +3547,12 @@ static int nf_tables_abort(struct sk_buff *skb) break; case NFT_MSG_NEWSET: trans->ctx.table->use--; - list_del(&nft_trans_set(trans)->list); + list_del_rcu(&nft_trans_set(trans)->list); break; case NFT_MSG_DELSET: trans->ctx.table->use++; - list_add_tail(&nft_trans_set(trans)->list, - &trans->ctx.table->sets); + list_add_tail_rcu(&nft_trans_set(trans)->list, + &trans->ctx.table->sets); nft_trans_destroy(trans); break; case NFT_MSG_NEWSETELEM: From 38e029f14a9702f71d5953246df9f722bca49017 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 1 Jul 2014 12:23:12 +0200 Subject: [PATCH 177/455] netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale An updater may interfer with the dumping of any of the object lists. Fix this by using a per-net generation counter and use the nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set to notify userspace that it has to restart the dump since an updater has interfered. This patch also replaces the existing consistency checking code in the rule dumping path since it is broken. Basically, the value that the dump callback returns is not propagated to userspace via netlink_dump_start(). Signed-off-by: Pablo Neira Ayuso --- include/net/netns/nftables.h | 2 +- net/netfilter/nf_tables_api.c | 30 +++++++++++++++++++++++------- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h index 26a394cb91a8..eee608b12cc9 100644 --- a/include/net/netns/nftables.h +++ b/include/net/netns/nftables.h @@ -13,8 +13,8 @@ struct netns_nftables { struct nft_af_info *inet; struct nft_af_info *arp; struct nft_af_info *bridge; + unsigned int base_seq; u8 gencursor; - u8 genctr; }; #endif diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index a27a7c56e7c3..ac03d748360e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -278,6 +278,8 @@ static int nf_tables_dump_tables(struct sk_buff *skb, int family = nfmsg->nfgen_family; rcu_read_lock(); + cb->seq = net->nft.base_seq; + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (family != NFPROTO_UNSPEC && family != afi->family) continue; @@ -295,6 +297,8 @@ static int nf_tables_dump_tables(struct sk_buff *skb, NLM_F_MULTI, afi->family, table) < 0) goto done; + + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } @@ -767,6 +771,8 @@ static int nf_tables_dump_chains(struct sk_buff *skb, int family = nfmsg->nfgen_family; rcu_read_lock(); + cb->seq = net->nft.base_seq; + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (family != NFPROTO_UNSPEC && family != afi->family) continue; @@ -784,6 +790,8 @@ static int nf_tables_dump_chains(struct sk_buff *skb, NLM_F_MULTI, afi->family, table, chain) < 0) goto done; + + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } @@ -1555,10 +1563,10 @@ static int nf_tables_dump_rules(struct sk_buff *skb, unsigned int idx = 0, s_idx = cb->args[0]; struct net *net = sock_net(skb->sk); int family = nfmsg->nfgen_family; - u8 genctr = ACCESS_ONCE(net->nft.genctr); - u8 gencursor = ACCESS_ONCE(net->nft.gencursor); rcu_read_lock(); + cb->seq = net->nft.base_seq; + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (family != NFPROTO_UNSPEC && family != afi->family) continue; @@ -1579,6 +1587,8 @@ static int nf_tables_dump_rules(struct sk_buff *skb, NLM_F_MULTI | NLM_F_APPEND, afi->family, table, chain, rule) < 0) goto done; + + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } @@ -1588,10 +1598,6 @@ cont: done: rcu_read_unlock(); - /* Invalidate this dump, a transition to the new generation happened */ - if (gencursor != net->nft.gencursor || genctr != net->nft.genctr) - return -EBUSY; - cb->args[0] = idx; return skb->len; } @@ -2244,6 +2250,8 @@ static int nf_tables_dump_sets_table(struct nft_ctx *ctx, struct sk_buff *skb, return skb->len; rcu_read_lock(); + cb->seq = ctx->net->nft.base_seq; + list_for_each_entry_rcu(set, &ctx->table->sets, list) { if (idx < s_idx) goto cont; @@ -2252,6 +2260,7 @@ static int nf_tables_dump_sets_table(struct nft_ctx *ctx, struct sk_buff *skb, cb->args[0] = idx; goto done; } + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } @@ -2272,6 +2281,8 @@ static int nf_tables_dump_sets_family(struct nft_ctx *ctx, struct sk_buff *skb, return skb->len; rcu_read_lock(); + cb->seq = ctx->net->nft.base_seq; + list_for_each_entry_rcu(table, &ctx->afi->tables, list) { if (cur_table) { if (cur_table != table) @@ -2290,6 +2301,7 @@ static int nf_tables_dump_sets_family(struct nft_ctx *ctx, struct sk_buff *skb, cb->args[2] = (unsigned long) table; goto done; } + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } @@ -2314,6 +2326,8 @@ static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, return skb->len; rcu_read_lock(); + cb->seq = net->nft.base_seq; + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { if (cur_family) { if (afi->family != cur_family) @@ -2344,6 +2358,7 @@ static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, cb->args[3] = afi->family; goto done; } + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); cont: idx++; } @@ -3361,7 +3376,7 @@ static int nf_tables_commit(struct sk_buff *skb) struct nft_set *set; /* Bump generation counter, invalidate any dump in progress */ - net->nft.genctr++; + while (++net->nft.base_seq == 0); /* A new generation has just started */ net->nft.gencursor = gencursor_next(net); @@ -3966,6 +3981,7 @@ static int nf_tables_init_net(struct net *net) { INIT_LIST_HEAD(&net->nft.af_info); INIT_LIST_HEAD(&net->nft.commit_list); + net->nft.base_seq = 1; return 0; } From ce355e209feb030945dae4c358c02f29a84f3f8b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 9 Jul 2014 15:14:06 +0200 Subject: [PATCH 178/455] netfilter: nf_tables: 64bit stats need some extra synchronization Use generic u64_stats_sync infrastructure to get proper 64bit stats, even on 32bit arches, at no extra cost for 64bit arches. Without this fix, 32bit arches can have some wrong counters at the time the carry is propagated into upper word. Signed-off-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables.h | 6 ++++-- net/netfilter/nf_tables_api.c | 15 +++++++++++---- net/netfilter/nf_tables_core.c | 10 ++++++---- 3 files changed, 21 insertions(+), 10 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 713b0b88bd5a..c4d86198d3d6 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #define NFT_JUMP_STACK_SIZE 16 @@ -528,8 +529,9 @@ enum nft_chain_type { }; struct nft_stats { - u64 bytes; - u64 pkts; + u64 bytes; + u64 pkts; + struct u64_stats_sync syncp; }; #define NFT_HOOK_OPS_MAX 2 diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index ac03d748360e..8746ff9a8357 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -644,13 +644,20 @@ static int nft_dump_stats(struct sk_buff *skb, struct nft_stats __percpu *stats) { struct nft_stats *cpu_stats, total; struct nlattr *nest; + unsigned int seq; + u64 pkts, bytes; int cpu; memset(&total, 0, sizeof(total)); for_each_possible_cpu(cpu) { cpu_stats = per_cpu_ptr(stats, cpu); - total.pkts += cpu_stats->pkts; - total.bytes += cpu_stats->bytes; + do { + seq = u64_stats_fetch_begin_irq(&cpu_stats->syncp); + pkts = cpu_stats->pkts; + bytes = cpu_stats->bytes; + } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, seq)); + total.pkts += pkts; + total.bytes += bytes; } nest = nla_nest_start(skb, NFTA_CHAIN_COUNTERS); if (nest == NULL) @@ -875,7 +882,7 @@ static struct nft_stats __percpu *nft_stats_alloc(const struct nlattr *attr) if (!tb[NFTA_COUNTER_BYTES] || !tb[NFTA_COUNTER_PACKETS]) return ERR_PTR(-EINVAL); - newstats = alloc_percpu(struct nft_stats); + newstats = netdev_alloc_pcpu_stats(struct nft_stats); if (newstats == NULL) return ERR_PTR(-ENOMEM); @@ -1091,7 +1098,7 @@ static int nf_tables_newchain(struct sock *nlsk, struct sk_buff *skb, } basechain->stats = stats; } else { - stats = alloc_percpu(struct nft_stats); + stats = netdev_alloc_pcpu_stats(struct nft_stats); if (IS_ERR(stats)) { module_put(type->owner); kfree(basechain); diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 345acfb1720b..3b90eb2b2c55 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -109,7 +109,7 @@ nft_do_chain(struct nft_pktinfo *pkt, const struct nf_hook_ops *ops) struct nft_data data[NFT_REG_MAX + 1]; unsigned int stackptr = 0; struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE]; - struct nft_stats __percpu *stats; + struct nft_stats *stats; int rulenum; /* * Cache cursor to avoid problems in case that the cursor is updated @@ -205,9 +205,11 @@ next_rule: nft_trace_packet(pkt, basechain, -1, NFT_TRACE_POLICY); rcu_read_lock_bh(); - stats = rcu_dereference(nft_base_chain(basechain)->stats); - __this_cpu_inc(stats->pkts); - __this_cpu_add(stats->bytes, pkt->skb->len); + stats = this_cpu_ptr(rcu_dereference(nft_base_chain(basechain)->stats)); + u64_stats_update_begin(&stats->syncp); + stats->pkts++; + stats->bytes += pkt->skb->len; + u64_stats_update_end(&stats->syncp); rcu_read_unlock_bh(); return nft_base_chain(basechain)->policy; From 3b3a1814d1703027f9867d0f5cbbfaf6c7482474 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 2 Jul 2014 12:46:23 -0400 Subject: [PATCH 179/455] block: provide compat ioctl for BLKZEROOUT This patch provides the compat BLKZEROOUT ioctl. The argument is a pointer to two uint64_t values, so there is no need to translate it. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org # 3.7+ Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe --- block/compat_ioctl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c index fbd5a67cb773..a0926a6094b2 100644 --- a/block/compat_ioctl.c +++ b/block/compat_ioctl.c @@ -690,6 +690,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) case BLKROSET: case BLKDISCARD: case BLKSECDISCARD: + case BLKZEROOUT: /* * the ones below are implemented in blkdev_locked_ioctl, * but we call blkdev_ioctl, which gets the lock for us From d3cc7996473a7bdd33256029988ea690754e4e2a Mon Sep 17 00:00:00 2001 From: Amit Shah Date: Thu, 10 Jul 2014 15:42:34 +0530 Subject: [PATCH 180/455] hwrng: fetch randomness only after device init Commit d9e7972619334 "hwrng: add randomness to system from rng sources" added a call to rng_get_data() from the hwrng_register() function. However, some rng devices need initialization before data can be read from them. This commit makes the call to rng_get_data() depend on no init fn pointer being registered by the device. If an init function is registered, this call is made after device init. CC: Kees Cook CC: Jason Cooper CC: Herbert Xu CC: # For v3.15+ Signed-off-by: Amit Shah Reviewed-by: Jason Cooper Signed-off-by: Herbert Xu --- drivers/char/hw_random/core.c | 41 ++++++++++++++++++++++++++++------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 334601cc81cf..2a451b14b3cc 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -55,16 +55,35 @@ static DEFINE_MUTEX(rng_mutex); static int data_avail; static u8 *rng_buffer; +static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, + int wait); + static size_t rng_buffer_size(void) { return SMP_CACHE_BYTES < 32 ? 32 : SMP_CACHE_BYTES; } +static void add_early_randomness(struct hwrng *rng) +{ + unsigned char bytes[16]; + int bytes_read; + + bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); + if (bytes_read > 0) + add_device_randomness(bytes, bytes_read); +} + static inline int hwrng_init(struct hwrng *rng) { - if (!rng->init) - return 0; - return rng->init(rng); + if (rng->init) { + int ret; + + ret = rng->init(rng); + if (ret) + return ret; + } + add_early_randomness(rng); + return 0; } static inline void hwrng_cleanup(struct hwrng *rng) @@ -304,8 +323,6 @@ int hwrng_register(struct hwrng *rng) { int err = -EINVAL; struct hwrng *old_rng, *tmp; - unsigned char bytes[16]; - int bytes_read; if (rng->name == NULL || (rng->data_read == NULL && rng->read == NULL)) @@ -347,9 +364,17 @@ int hwrng_register(struct hwrng *rng) INIT_LIST_HEAD(&rng->list); list_add_tail(&rng->list, &rng_list); - bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); - if (bytes_read > 0) - add_device_randomness(bytes, bytes_read); + if (old_rng && !rng->init) { + /* + * Use a new device's input to add some randomness to + * the system. If this rng device isn't going to be + * used right away, its init function hasn't been + * called yet; so only use the randomness from devices + * that don't need an init callback. + */ + add_early_randomness(rng); + } + out_unlock: mutex_unlock(&rng_mutex); out: From e052dbf554610e2104c5a7518c4d8374bed701bb Mon Sep 17 00:00:00 2001 From: Amit Shah Date: Thu, 10 Jul 2014 15:42:35 +0530 Subject: [PATCH 181/455] hwrng: virtio - ensure reads happen after successful probe The hwrng core asks for random data in the hwrng_register() call itself from commit d9e7972619. This doesn't play well with virtio -- the DRIVER_OK bit is only set by virtio core on a successful probe, and we're not yet out of our probe routine when this call is made. This causes the host to not acknowledge any requests we put in the virtqueue, and the insmod or kernel boot process just waits for data to arrive from the host, which never happens. CC: Kees Cook CC: Jason Cooper CC: Herbert Xu CC: # For v3.15+ Reviewed-by: Jason Cooper Signed-off-by: Amit Shah Signed-off-by: Herbert Xu --- drivers/char/hw_random/core.c | 6 ++++++ drivers/char/hw_random/virtio-rng.c | 10 ++++++++++ 2 files changed, 16 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 2a451b14b3cc..c4419ea1ab07 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -68,6 +68,12 @@ static void add_early_randomness(struct hwrng *rng) unsigned char bytes[16]; int bytes_read; + /* + * Currently only virtio-rng cannot return data during device + * probe, and that's handled in virtio-rng.c itself. If there + * are more such devices, this call to rng_get_data can be + * made conditional here instead of doing it per-device. + */ bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); if (bytes_read > 0) add_device_randomness(bytes, bytes_read); diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index f3e71501de54..e9b15bc18b4d 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -38,6 +38,8 @@ struct virtrng_info { int index; }; +static bool probe_done; + static void random_recv_done(struct virtqueue *vq) { struct virtrng_info *vi = vq->vdev->priv; @@ -67,6 +69,13 @@ static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) int ret; struct virtrng_info *vi = (struct virtrng_info *)rng->priv; + /* + * Don't ask host for data till we're setup. This call can + * happen during hwrng_register(), after commit d9e7972619. + */ + if (unlikely(!probe_done)) + return 0; + if (!vi->busy) { vi->busy = true; init_completion(&vi->have_data); @@ -137,6 +146,7 @@ static int probe_common(struct virtio_device *vdev) return err; } + probe_done = true; return 0; } From 7188b067576db95445bf4e9498f1bdb2e612dd2f Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 14 Jul 2014 15:41:25 +0200 Subject: [PATCH 182/455] MAINTAINERS: Add Hans de Goede as ahci-platform maintainer Signed-off-by: Hans de Goede Signed-off-by: Tejun Heo --- MAINTAINERS | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 134483f206e4..c92bb80fcdbe 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7993,6 +7993,16 @@ F: drivers/ata/ F: include/linux/ata.h F: include/linux/libata.h +SERIAL ATA AHCI PLATFORM devices support +M: Hans de Goede +M: Tejun Heo +L: linux-ide@vger.kernel.org +T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata.git +S: Supported +F: drivers/ata/ahci_platform.c +F: drivers/ata/libahci_platform.c +F: include/linux/ahci_platform.h + SERVER ENGINES 10Gbps iSCSI - BladeEngine 2 DRIVER M: Jayamohan Kallickal L: linux-scsi@vger.kernel.org From 27f1b36326bc8b6911e7052bc4b80a10234f0ec5 Mon Sep 17 00:00:00 2001 From: Maxim Patlasov Date: Thu, 10 Jul 2014 15:32:43 +0400 Subject: [PATCH 183/455] fuse: release temporary page if fuse_writepage_locked() failed tmp_page to be freed if fuse_write_file_get() returns NULL. Signed-off-by: Maxim Patlasov Signed-off-by: Miklos Szeredi --- fs/fuse/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 96d513e01a5d..35b6f31ecc38 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1722,7 +1722,7 @@ static int fuse_writepage_locked(struct page *page) error = -EIO; req->ff = fuse_write_file_get(fc, fi); if (!req->ff) - goto err_free; + goto err_nofile; fuse_write_fill(req, req->ff, page_offset(page), 0); @@ -1750,6 +1750,8 @@ static int fuse_writepage_locked(struct page *page) return 0; +err_nofile: + __free_page(tmp_page); err_free: fuse_request_free(req); err: From f2b3455e47c72bef80fe584adb9bb9bc5afdd99c Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Mon, 23 Jun 2014 18:35:15 +0200 Subject: [PATCH 184/455] fuse: replace count*size kzalloc by kcalloc kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick Signed-off-by: Miklos Szeredi --- fs/fuse/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 35b6f31ecc38..1570dad1915d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1992,8 +1992,8 @@ static int fuse_writepages(struct address_space *mapping, data.ff = NULL; err = -ENOMEM; - data.orig_pages = kzalloc(sizeof(struct page *) * - FUSE_MAX_PAGES_PER_REQ, + data.orig_pages = kcalloc(FUSE_MAX_PAGES_PER_REQ, + sizeof(struct page *), GFP_NOFS); if (!data.orig_pages) goto out; From ba394f0b6aa7a4a6afe67176da5d29f0ac59c48d Mon Sep 17 00:00:00 2001 From: Sekhar Nori Date: Mon, 14 Jul 2014 18:43:46 +0530 Subject: [PATCH 185/455] ARM: OMAP2+: l2c: squelch warning dump on power control setting On OMAP SOCs using PL310 controllers, power_ctrl register is not accessible from non-secure software even on PL310 versions which support it. The secure code takes care of setting it up correctly and power transitions are proven on these devices. For example, AM437x has L2C-310 version r3p3 and ROM code on that device does not support writing to L2C-310 power control register. The L2C driver, however, tries writing to this register for all revisions >= r3p0. This leads to a warning dump on boot which leads most users to believe that L2 cache is non-functional. Since the problem is understood, and cannot be addressed through software, replace the warning with a pr_info() while maintaining the WARN_ON() for other truly unexpected scenarios. Reported-by: Nishanth Menon Tested-by: Felipe Balbi Signed-off-by: Sekhar Nori Acked-by: Santosh Shilimkar Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/omap4-common.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index 539e8106eb96..a0fe747634c1 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -168,6 +168,10 @@ static void omap4_l2c310_write_sec(unsigned long val, unsigned reg) smc_op = OMAP4_MON_L2X0_PREFETCH_INDEX; break; + case L310_POWER_CTRL: + pr_info_once("OMAP L2C310: ROM does not support power control setting\n"); + return; + default: WARN_ONCE(1, "OMAP L2C310: ignoring write to reg 0x%x\n", reg); return; From 1871ee134b73fb4cadab75752a7152ed2813c751 Mon Sep 17 00:00:00 2001 From: Kevin Hao Date: Sat, 12 Jul 2014 12:08:24 +0800 Subject: [PATCH 186/455] libata: support the ata host which implements a queue depth less than 32 The sata on fsl mpc8315e is broken after the commit 8a4aeec8d2d6 ("libata/ahci: accommodate tag ordered controllers"). The reason is that the ata controller on this SoC only implement a queue depth of 16. When issuing the commands in tag order, all the commands in tag 16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata malfunction. It makes no senses to use a 32 queue in software while the hardware has less queue depth. So consider the queue depth implemented by the hardware when requesting a command tag. Fixes: 8a4aeec8d2d6 ("libata/ahci: accommodate tag ordered controllers") Cc: stable@vger.kernel.org Signed-off-by: Kevin Hao Acked-by: Dan Williams Signed-off-by: Tejun Heo --- drivers/ata/libata-core.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 18d97d5c7d90..d19c37a7abc9 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4787,6 +4787,10 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words) * ata_qc_new - Request an available ATA command, for queueing * @ap: target port * + * Some ATA host controllers may implement a queue depth which is less + * than ATA_MAX_QUEUE. So we shouldn't allocate a tag which is beyond + * the hardware limitation. + * * LOCKING: * None. */ @@ -4794,14 +4798,16 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words) static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) { struct ata_queued_cmd *qc = NULL; - unsigned int i, tag; + unsigned int i, tag, max_queue; + + max_queue = ap->scsi_host->can_queue; /* no command while frozen */ if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) return NULL; - for (i = 0; i < ATA_MAX_QUEUE; i++) { - tag = (i + ap->last_tag + 1) % ATA_MAX_QUEUE; + for (i = 0, tag = ap->last_tag + 1; i < max_queue; i++, tag++) { + tag = tag < max_queue ? tag : 0; /* the last tag is reserved for internal command. */ if (tag == ATA_TAG_INTERNAL) @@ -6169,6 +6175,16 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; + /* + * The max queue supported by hardware must not be greater than + * ATA_MAX_QUEUE. + */ + if (sht->can_queue > ATA_MAX_QUEUE) { + dev_err(host->dev, "BUG: the hardware max queue is too large\n"); + WARN_ON(1); + return -EINVAL; + } + /* host must have been started */ if (!(host->flags & ATA_HOST_STARTED)) { dev_err(host->dev, "BUG: trying to register unstarted host\n"); From 263782c1c95bbddbb022dc092fd89a36bb8d5577 Mon Sep 17 00:00:00 2001 From: Benjamin LaHaise Date: Mon, 14 Jul 2014 12:49:26 -0400 Subject: [PATCH 187/455] aio: protect reqs_available updates from changes in interrupt handlers As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to have put_reqs_available() called from irq context. While put_reqs_available() is per cpu, it did not protect itself from interrupts on the same CPU. This lead to aio_complete() corrupting the available io requests count when run under a heavy O_DIRECT workloads as reported by Robert Elliott. Fix this by disabling irq updates around the per cpu batch updates of reqs_available. Many thanks to Robert and folks for testing and tracking this down. Reported-by: Robert Elliot Tested-by: Robert Elliot Signed-off-by: Benjamin LaHaise Cc: Jens Axboe , Christoph Hellwig Cc: stable@vger.kenel.org --- fs/aio.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/aio.c b/fs/aio.c index 955947ef3e02..1c9c5f0a9e2b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -830,16 +830,20 @@ void exit_aio(struct mm_struct *mm) static void put_reqs_available(struct kioctx *ctx, unsigned nr) { struct kioctx_cpu *kcpu; + unsigned long flags; preempt_disable(); kcpu = this_cpu_ptr(ctx->cpu); + local_irq_save(flags); kcpu->reqs_available += nr; + while (kcpu->reqs_available >= ctx->req_batch * 2) { kcpu->reqs_available -= ctx->req_batch; atomic_add(ctx->req_batch, &ctx->reqs_available); } + local_irq_restore(flags); preempt_enable(); } @@ -847,10 +851,12 @@ static bool get_reqs_available(struct kioctx *ctx) { struct kioctx_cpu *kcpu; bool ret = false; + unsigned long flags; preempt_disable(); kcpu = this_cpu_ptr(ctx->cpu); + local_irq_save(flags); if (!kcpu->reqs_available) { int old, avail = atomic_read(&ctx->reqs_available); @@ -869,6 +875,7 @@ static bool get_reqs_available(struct kioctx *ctx) ret = true; kcpu->reqs_available--; out: + local_irq_restore(flags); preempt_enable(); return ret; } From 9c8958bc24e241de2c0d4740e6dea861fe47daa5 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Mon, 14 Jul 2014 19:35:31 +0200 Subject: [PATCH 188/455] drm/i915: Track the primary plane correctly when reassigning planes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 98ec77397a5c68ce753dc283aaa6f4742328bcdd Author: Ville Syrjälä Date: Wed Apr 30 17:43:01 2014 +0300 drm/i915: Make primary_enabled match the actual hardware state introduced more accurate tracking of the primary plane and some checks. It missed the plane->pipe reassignement code for gen2/3 though, which the checks caught and resulted in WARNING backtraces. Since we only use this path if the plane is on and on the wrong pipe we can just always set the tracking bit to "enabled". Reported-and-tested-by: Paul Bolle Cc: Paul Bolle Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 82e7d57f0a8a..f0be855ddf45 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11914,6 +11914,7 @@ static void intel_sanitize_crtc(struct intel_crtc *crtc) * ... */ plane = crtc->plane; crtc->plane = !plane; + crtc->primary_enabled = true; dev_priv->display.crtc_disable(&crtc->base); crtc->plane = plane; From ee14b644daaa58afe1e91bb9ebd9cf1b18d1f5fa Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Wed, 9 Jul 2014 09:18:59 +0800 Subject: [PATCH 189/455] hwmon: (da9052) Don't use dash in the name attribute Dashes are not allowed in hwmon name attributes. Use "da9052" instead of "da9052-hwmon". Signed-off-by: Axel Lin Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck --- drivers/hwmon/da9052-hwmon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/da9052-hwmon.c b/drivers/hwmon/da9052-hwmon.c index afd31042b452..d14ab3c45daa 100644 --- a/drivers/hwmon/da9052-hwmon.c +++ b/drivers/hwmon/da9052-hwmon.c @@ -194,7 +194,7 @@ static ssize_t da9052_hwmon_show_name(struct device *dev, struct device_attribute *devattr, char *buf) { - return sprintf(buf, "da9052-hwmon\n"); + return sprintf(buf, "da9052\n"); } static ssize_t show_label(struct device *dev, From 6b00f440dd678d786389a7100a2e03fe44478431 Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Wed, 9 Jul 2014 09:22:54 +0800 Subject: [PATCH 190/455] hwmon: (da9055) Don't use dash in the name attribute Dashes are not allowed in hwmon name attributes. Use "da9055" instead of "da9055-hwmon". Signed-off-by: Axel Lin Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck --- drivers/hwmon/da9055-hwmon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/da9055-hwmon.c b/drivers/hwmon/da9055-hwmon.c index 73b3865f1207..35eb7738d711 100644 --- a/drivers/hwmon/da9055-hwmon.c +++ b/drivers/hwmon/da9055-hwmon.c @@ -204,7 +204,7 @@ static ssize_t da9055_hwmon_show_name(struct device *dev, struct device_attribute *devattr, char *buf) { - return sprintf(buf, "da9055-hwmon\n"); + return sprintf(buf, "da9055\n"); } static ssize_t show_label(struct device *dev, From 2843768b701971ab10e62c77d5c75ad7c306f1bd Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 14 Jul 2014 19:35:45 +0200 Subject: [PATCH 191/455] Revert "ACPI / video: change acpi-video brightness_switch_enabled default to 0" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 886129a8eebeb (ACPI / video: change acpi-video brightness_switch_enabled default to 0) as it is reported to cause problems to happen. Fixes: 886129a8eebeb (ACPI / video: change acpi-video brightness_switch_enabled default to 0) Link: http://marc.info/?l=linux-acpi&m=140534286826819&w=2 Reported by: Bjørn Mork Signed-off-by: Rafael J. Wysocki --- Documentation/kernel-parameters.txt | 2 +- drivers/acpi/video.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index c1b9aa8c5a52..e332e718cad6 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3526,7 +3526,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. the allocated input device; If set to 0, video driver will only send out the event without touching backlight brightness level. - default: 0 + default: 1 virtio_mmio.device= [VMMIO] Memory mapped virtio (platform) device. diff --git a/drivers/acpi/video.c b/drivers/acpi/video.c index 071c1dfb93f3..29649c194886 100644 --- a/drivers/acpi/video.c +++ b/drivers/acpi/video.c @@ -68,7 +68,7 @@ MODULE_AUTHOR("Bruno Ducrot"); MODULE_DESCRIPTION("ACPI Video Driver"); MODULE_LICENSE("GPL"); -static bool brightness_switch_enabled; +static bool brightness_switch_enabled = 1; module_param(brightness_switch_enabled, bool, 0644); /* From 2448e3493cb3874baa90725c87869455ebf11cd2 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 11 Jul 2014 21:06:38 +0200 Subject: [PATCH 192/455] tracing: instance_rmdir() leaks ftrace_event_file->filter instance_rmdir() path destroys the event files but forgets to free file->filter. Change remove_event_file_dir() to free_event_filter(). Link: http://lkml.kernel.org/p/20140711190638.GA19517@redhat.com Cc: Masami Hiramatsu Cc: Namhyung Kim Cc: Srikar Dronamraju Cc: Tom Zanussi Cc: "zhangwei(Jovi)" Cc: stable@vger.kernel.org # 3.11+ Fixes: f6a84bdc75b5 "tracing: Introduce remove_event_file_dir()" Signed-off-by: Oleg Nesterov Signed-off-by: Steven Rostedt --- kernel/trace/trace_events.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index f99e0b3bca8c..2de53628689f 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -470,6 +470,7 @@ static void remove_event_file_dir(struct ftrace_event_file *file) list_del(&file->list); remove_subsystem(file->system); + free_event_filter(file->filter); kmem_cache_free(file_cachep, file); } From 8762e5092828c4dc0f49da5a47a644c670df77f3 Mon Sep 17 00:00:00 2001 From: Boris Ostrovsky Date: Wed, 9 Jul 2014 13:18:18 -0400 Subject: [PATCH 193/455] x86/espfix/xen: Fix allocation of pages for paravirt page tables init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: Boris Ostrovsky Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: H. Peter Anvin Cc: --- arch/x86/kernel/espfix_64.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c index 6afbb16e9b79..94d857fb1033 100644 --- a/arch/x86/kernel/espfix_64.c +++ b/arch/x86/kernel/espfix_64.c @@ -175,7 +175,7 @@ void init_espfix_ap(void) if (!pud_present(pud)) { pmd_p = (pmd_t *)__get_free_page(PGALLOC_GFP); pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask)); - paravirt_alloc_pud(&init_mm, __pa(pmd_p) >> PAGE_SHIFT); + paravirt_alloc_pmd(&init_mm, __pa(pmd_p) >> PAGE_SHIFT); for (n = 0; n < ESPFIX_PUD_CLONES; n++) set_pud(&pud_p[n], pud); } @@ -185,7 +185,7 @@ void init_espfix_ap(void) if (!pmd_present(pmd)) { pte_p = (pte_t *)__get_free_page(PGALLOC_GFP); pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask)); - paravirt_alloc_pmd(&init_mm, __pa(pte_p) >> PAGE_SHIFT); + paravirt_alloc_pte(&init_mm, __pa(pte_p) >> PAGE_SHIFT); for (n = 0; n < ESPFIX_PMD_CLONES; n++) set_pmd(&pmd_p[n], pmd); } @@ -193,7 +193,6 @@ void init_espfix_ap(void) pte_p = pte_offset_kernel(&pmd, addr); stack_page = (void *)__get_free_page(GFP_KERNEL); pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask)); - paravirt_alloc_pte(&init_mm, __pa(stack_page) >> PAGE_SHIFT); for (n = 0; n < ESPFIX_PTE_CLONES; n++) set_pte(&pte_p[n*PTE_STRIDE], pte); From aa182e64f16fc29a4984c2d79191b161888bbd9b Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Tue, 15 Jul 2014 07:08:10 +1000 Subject: [PATCH 194/455] Revert "xfs: block allocation work needs to be kswapd aware" This reverts commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679. This commit resulted in regressions in performance in low memory situations where kswapd was doing writeback of delayed allocation blocks. It resulted in significant parallelism of the kswapd work and with the special kswapd flags meant that hundreds of active allocation could dip into kswapd specific memory reserves and avoid being throttled. This cause a large amount of performance variation, as well as random OOM-killer invocations that didn't previously exist. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_bmap_util.c | 16 +++------------- fs/xfs/xfs_bmap_util.h | 13 ++++++------- 2 files changed, 9 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 703b3ec1796c..057f671811d6 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -258,23 +258,14 @@ xfs_bmapi_allocate_worker( struct xfs_bmalloca *args = container_of(work, struct xfs_bmalloca, work); unsigned long pflags; - unsigned long new_pflags = PF_FSTRANS; - /* - * we are in a transaction context here, but may also be doing work - * in kswapd context, and hence we may need to inherit that state - * temporarily to ensure that we don't block waiting for memory reclaim - * in any way. - */ - if (args->kswapd) - new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; - - current_set_flags_nested(&pflags, new_pflags); + /* we are in a transaction context here */ + current_set_flags_nested(&pflags, PF_FSTRANS); args->result = __xfs_bmapi_allocate(args); complete(args->done); - current_restore_flags_nested(&pflags, new_pflags); + current_restore_flags_nested(&pflags, PF_FSTRANS); } /* @@ -293,7 +284,6 @@ xfs_bmapi_allocate( args->done = &done; - args->kswapd = current_is_kswapd(); INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker); queue_work(xfs_alloc_wq, &args->work); wait_for_completion(&done); diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 075f72232a64..935ed2b24edf 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -50,13 +50,12 @@ struct xfs_bmalloca { xfs_extlen_t total; /* total blocks needed for xaction */ xfs_extlen_t minlen; /* minimum allocation size (blocks) */ xfs_extlen_t minleft; /* amount must be left after alloc */ - bool eof; /* set if allocating past last extent */ - bool wasdel; /* replacing a delayed allocation */ - bool userdata;/* set if is user data */ - bool aeof; /* allocated space at eof */ - bool conv; /* overwriting unwritten extents */ - bool stack_switch; - bool kswapd; /* allocation in kswapd context */ + char eof; /* set if allocating past last extent */ + char wasdel; /* replacing a delayed allocation */ + char userdata;/* set if is user data */ + char aeof; /* allocated space at eof */ + char conv; /* overwriting unwritten extents */ + char stack_switch; int flags; struct completion *done; struct work_struct work; From cf11da9c5d374962913ca5ba0ce0886b58286224 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Tue, 15 Jul 2014 07:08:24 +1000 Subject: [PATCH 195/455] xfs: refine the allocation stack switch The allocation stack switch at xfs_bmapi_allocate() has served it's purpose, but is no longer a sufficient solution to the stack usage problem we have in the XFS allocation path. Whilst the kernel stack size is now 16k, that is not a valid reason for undoing all our "keep stack usage down" modifications. What it does allow us to do is have the freedom to refine and perfect the modifications knowing that if we get it wrong it won't blow up in our faces - we have a safety net now. This is important because we still have the issue of older kernels having smaller stacks and that they are still supported and are demonstrating a wide range of different stack overflows. Red Hat has several open bugs for allocation based stack overflows from directory modifications and direct IO block allocation and these problems still need to be solved. If we can solve them upstream, then distro's won't need to bake their own unique solutions. To that end, I've observed that every allocation based stack overflow report has had a specific characteristic - it has happened during or directly after a bmap btree block split. That event requires a new block to be allocated to the tree, and so we effectively stack one allocation stack on top of another, and that's when we get into trouble. A further observation is that bmap btree block splits are much rarer than writeback allocation - over a range of different workloads I've observed the ratio of bmap btree inserts to splits ranges from 100:1 (xfstests run) to 10000:1 (local VM image server with sparse files that range in the hundreds of thousands to millions of extents). Either way, bmap btree split events are much, much rarer than allocation events. Finally, we have to move the kswapd state to the allocation workqueue work when allocation is done on behalf of kswapd. This is proving to cause significant perturbation in performance under memory pressure and appears to be generating allocation deadlock warnings under some workloads, so avoiding the use of a workqueue for the majority of kswapd writeback allocation will minimise the impact of such behaviour. Hence it makes sense to move the stack switch to xfs_btree_split() and only do it for bmap btree splits. Stack switches during allocation will be much rarer, so there won't be significant performacne overhead caused by switching stacks. The worse case stack from all allocation paths will be split, not just writeback. And the majority of memory allocations will be done in the correct context (e.g. kswapd) without causing additional latency, and so we simplify the memory reclaim interactions between processes, workqueues and kswapd. The worst stack I've been able to generate with this patch in place is 5600 bytes deep. It's very revealing because we exit XFS at: 37) 1768 64 kmem_cache_alloc+0x13b/0x170 about 1800 bytes of stack consumed, and the remaining 3800 bytes (and 36 functions) is memory reclaim, swap and the IO stack. And this occurs in the inode allocation from an open(O_CREAT) syscall, not writeback. The amount of stack being used is much less than I've previously be able to generate - fs_mark testing has been able to generate stack usage of around 7k without too much trouble; with this patch it's only just getting to 5.5k. This is primarily because the metadata allocation paths (e.g. directory blocks) are no longer causing double splits on the same stack, and hence now stack tracing is showing swapping being the worst stack consumer rather than XFS. Performance of fs_mark inode create workloads is unchanged. Performance of fs_mark async fsync workloads is consistently good with context switches reduced by around 150,000/s (30%). Performance of dbench, streaming IO and postmark is unchanged. Allocation deadlock warnings have not been seen on the workloads that generated them since adding this patch. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_bmap.c | 7 ++-- fs/xfs/xfs_bmap.h | 4 +-- fs/xfs/xfs_bmap_util.c | 43 ---------------------- fs/xfs/xfs_bmap_util.h | 13 +++---- fs/xfs/xfs_btree.c | 82 +++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_iomap.c | 3 +- 6 files changed, 90 insertions(+), 62 deletions(-) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 96175df211b1..75c3fe5f3d9d 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -4298,8 +4298,8 @@ xfs_bmapi_delay( } -int -__xfs_bmapi_allocate( +static int +xfs_bmapi_allocate( struct xfs_bmalloca *bma) { struct xfs_mount *mp = bma->ip->i_mount; @@ -4578,9 +4578,6 @@ xfs_bmapi_write( bma.flist = flist; bma.firstblock = firstblock; - if (flags & XFS_BMAPI_STACK_SWITCH) - bma.stack_switch = 1; - while (bno < end && n < *nmap) { inhole = eof || bma.got.br_startoff > bno; wasdelay = !inhole && isnullstartblock(bma.got.br_startblock); diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h index 38ba36e9b2f0..b879ca56a64c 100644 --- a/fs/xfs/xfs_bmap.h +++ b/fs/xfs/xfs_bmap.h @@ -77,7 +77,6 @@ typedef struct xfs_bmap_free * from written to unwritten, otherwise convert from unwritten to written. */ #define XFS_BMAPI_CONVERT 0x040 -#define XFS_BMAPI_STACK_SWITCH 0x080 #define XFS_BMAPI_FLAGS \ { XFS_BMAPI_ENTIRE, "ENTIRE" }, \ @@ -86,8 +85,7 @@ typedef struct xfs_bmap_free { XFS_BMAPI_PREALLOC, "PREALLOC" }, \ { XFS_BMAPI_IGSTATE, "IGSTATE" }, \ { XFS_BMAPI_CONTIG, "CONTIG" }, \ - { XFS_BMAPI_CONVERT, "CONVERT" }, \ - { XFS_BMAPI_STACK_SWITCH, "STACK_SWITCH" } + { XFS_BMAPI_CONVERT, "CONVERT" } static inline int xfs_bmapi_aflag(int w) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 057f671811d6..64731ef3324d 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -248,49 +248,6 @@ xfs_bmap_rtalloc( return 0; } -/* - * Stack switching interfaces for allocation - */ -static void -xfs_bmapi_allocate_worker( - struct work_struct *work) -{ - struct xfs_bmalloca *args = container_of(work, - struct xfs_bmalloca, work); - unsigned long pflags; - - /* we are in a transaction context here */ - current_set_flags_nested(&pflags, PF_FSTRANS); - - args->result = __xfs_bmapi_allocate(args); - complete(args->done); - - current_restore_flags_nested(&pflags, PF_FSTRANS); -} - -/* - * Some allocation requests often come in with little stack to work on. Push - * them off to a worker thread so there is lots of stack to use. Otherwise just - * call directly to avoid the context switch overhead here. - */ -int -xfs_bmapi_allocate( - struct xfs_bmalloca *args) -{ - DECLARE_COMPLETION_ONSTACK(done); - - if (!args->stack_switch) - return __xfs_bmapi_allocate(args); - - - args->done = &done; - INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker); - queue_work(xfs_alloc_wq, &args->work); - wait_for_completion(&done); - destroy_work_on_stack(&args->work); - return args->result; -} - /* * Check if the endoff is outside the last extent. If so the caller will grow * the allocation to a stripe unit boundary. All offsets are considered outside diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 935ed2b24edf..2fdb72d2c908 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -50,12 +50,11 @@ struct xfs_bmalloca { xfs_extlen_t total; /* total blocks needed for xaction */ xfs_extlen_t minlen; /* minimum allocation size (blocks) */ xfs_extlen_t minleft; /* amount must be left after alloc */ - char eof; /* set if allocating past last extent */ - char wasdel; /* replacing a delayed allocation */ - char userdata;/* set if is user data */ - char aeof; /* allocated space at eof */ - char conv; /* overwriting unwritten extents */ - char stack_switch; + bool eof; /* set if allocating past last extent */ + bool wasdel; /* replacing a delayed allocation */ + bool userdata;/* set if is user data */ + bool aeof; /* allocated space at eof */ + bool conv; /* overwriting unwritten extents */ int flags; struct completion *done; struct work_struct work; @@ -65,8 +64,6 @@ struct xfs_bmalloca { int xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist, int *committed); int xfs_bmap_rtalloc(struct xfs_bmalloca *ap); -int xfs_bmapi_allocate(struct xfs_bmalloca *args); -int __xfs_bmapi_allocate(struct xfs_bmalloca *args); int xfs_bmap_eof(struct xfs_inode *ip, xfs_fileoff_t endoff, int whichfork, int *eof); int xfs_bmap_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip, diff --git a/fs/xfs/xfs_btree.c b/fs/xfs/xfs_btree.c index bf810c6baf2b..cf893bc1e373 100644 --- a/fs/xfs/xfs_btree.c +++ b/fs/xfs/xfs_btree.c @@ -33,6 +33,7 @@ #include "xfs_error.h" #include "xfs_trace.h" #include "xfs_cksum.h" +#include "xfs_alloc.h" /* * Cursor allocation zone. @@ -2323,7 +2324,7 @@ error1: * record (to be inserted into parent). */ STATIC int /* error */ -xfs_btree_split( +__xfs_btree_split( struct xfs_btree_cur *cur, int level, union xfs_btree_ptr *ptrp, @@ -2503,6 +2504,85 @@ error0: return error; } +struct xfs_btree_split_args { + struct xfs_btree_cur *cur; + int level; + union xfs_btree_ptr *ptrp; + union xfs_btree_key *key; + struct xfs_btree_cur **curp; + int *stat; /* success/failure */ + int result; + bool kswapd; /* allocation in kswapd context */ + struct completion *done; + struct work_struct work; +}; + +/* + * Stack switching interfaces for allocation + */ +static void +xfs_btree_split_worker( + struct work_struct *work) +{ + struct xfs_btree_split_args *args = container_of(work, + struct xfs_btree_split_args, work); + unsigned long pflags; + unsigned long new_pflags = PF_FSTRANS; + + /* + * we are in a transaction context here, but may also be doing work + * in kswapd context, and hence we may need to inherit that state + * temporarily to ensure that we don't block waiting for memory reclaim + * in any way. + */ + if (args->kswapd) + new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; + + current_set_flags_nested(&pflags, new_pflags); + + args->result = __xfs_btree_split(args->cur, args->level, args->ptrp, + args->key, args->curp, args->stat); + complete(args->done); + + current_restore_flags_nested(&pflags, new_pflags); +} + +/* + * BMBT split requests often come in with little stack to work on. Push + * them off to a worker thread so there is lots of stack to use. For the other + * btree types, just call directly to avoid the context switch overhead here. + */ +STATIC int /* error */ +xfs_btree_split( + struct xfs_btree_cur *cur, + int level, + union xfs_btree_ptr *ptrp, + union xfs_btree_key *key, + struct xfs_btree_cur **curp, + int *stat) /* success/failure */ +{ + struct xfs_btree_split_args args; + DECLARE_COMPLETION_ONSTACK(done); + + if (cur->bc_btnum != XFS_BTNUM_BMAP) + return __xfs_btree_split(cur, level, ptrp, key, curp, stat); + + args.cur = cur; + args.level = level; + args.ptrp = ptrp; + args.key = key; + args.curp = curp; + args.stat = stat; + args.done = &done; + args.kswapd = current_is_kswapd(); + INIT_WORK_ONSTACK(&args.work, xfs_btree_split_worker); + queue_work(xfs_alloc_wq, &args.work); + wait_for_completion(&done); + destroy_work_on_stack(&args.work); + return args.result; +} + + /* * Copy the old inode root contents into a real block and make the * broot point to it. diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 6c5eb4c551e3..6d3ec2b6ee29 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -749,8 +749,7 @@ xfs_iomap_write_allocate( * pointer that the caller gave to us. */ error = xfs_bmapi_write(tp, ip, map_start_fsb, - count_fsb, - XFS_BMAPI_STACK_SWITCH, + count_fsb, 0, &first_block, 1, imap, &nimaps, &free_list); if (error) From c6930992948adf0f8fc1f6ff1da51c5002a2cf95 Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Mon, 14 Jul 2014 11:04:39 +1000 Subject: [PATCH 196/455] Revert "drm/i915: reverse dp link param selection, prefer fast over wide again" This reverts commit 38aecea0ccbb909d635619cba22f1891e589b434. This breaks Haswell Thinkpad + Lenovo dock in SST mode with a HDMI monitor attached. Before this we can 1920x1200 mode, after this we only ever get 1024x768, and a lot of deferring. This didn't revert clean, but this should be fine. bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1117008 Cc: stable@vger.kernel.org # v3.15 Signed-off-by: Dave Airlie Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/intel_dp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 075170d1844f..8a1a4fbc06ac 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -906,8 +906,8 @@ intel_dp_compute_config(struct intel_encoder *encoder, mode_rate = intel_dp_link_required(adjusted_mode->crtc_clock, bpp); - for (lane_count = min_lane_count; lane_count <= max_lane_count; lane_count <<= 1) { - for (clock = min_clock; clock <= max_clock; clock++) { + for (clock = min_clock; clock <= max_clock; clock++) { + for (lane_count = min_lane_count; lane_count <= max_lane_count; lane_count <<= 1) { link_clock = drm_dp_bw_code_to_link_rate(bws[clock]); link_avail = intel_dp_max_data_rate(link_clock, lane_count); From 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Sat, 12 Jul 2014 20:30:35 +0200 Subject: [PATCH 197/455] net: sctp: fix information leaks in ulpevent layer While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller --- net/sctp/ulpevent.c | 122 ++++++-------------------------------------- 1 file changed, 15 insertions(+), 107 deletions(-) diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index 85c64658bd0b..b6842fdb53d4 100644 --- a/net/sctp/ulpevent.c +++ b/net/sctp/ulpevent.c @@ -366,9 +366,10 @@ fail: * specification [SCTP] and any extensions for a list of possible * error formats. */ -struct sctp_ulpevent *sctp_ulpevent_make_remote_error( - const struct sctp_association *asoc, struct sctp_chunk *chunk, - __u16 flags, gfp_t gfp) +struct sctp_ulpevent * +sctp_ulpevent_make_remote_error(const struct sctp_association *asoc, + struct sctp_chunk *chunk, __u16 flags, + gfp_t gfp) { struct sctp_ulpevent *event; struct sctp_remote_error *sre; @@ -387,8 +388,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_remote_error( /* Copy the skb to a new skb with room for us to prepend * notification with. */ - skb = skb_copy_expand(chunk->skb, sizeof(struct sctp_remote_error), - 0, gfp); + skb = skb_copy_expand(chunk->skb, sizeof(*sre), 0, gfp); /* Pull off the rest of the cause TLV from the chunk. */ skb_pull(chunk->skb, elen); @@ -399,62 +399,21 @@ struct sctp_ulpevent *sctp_ulpevent_make_remote_error( event = sctp_skb2event(skb); sctp_ulpevent_init(event, MSG_NOTIFICATION, skb->truesize); - sre = (struct sctp_remote_error *) - skb_push(skb, sizeof(struct sctp_remote_error)); + sre = (struct sctp_remote_error *) skb_push(skb, sizeof(*sre)); /* Trim the buffer to the right length. */ - skb_trim(skb, sizeof(struct sctp_remote_error) + elen); + skb_trim(skb, sizeof(*sre) + elen); - /* Socket Extensions for SCTP - * 5.3.1.3 SCTP_REMOTE_ERROR - * - * sre_type: - * It should be SCTP_REMOTE_ERROR. - */ + /* RFC6458, Section 6.1.3. SCTP_REMOTE_ERROR */ + memset(sre, 0, sizeof(*sre)); sre->sre_type = SCTP_REMOTE_ERROR; - - /* - * Socket Extensions for SCTP - * 5.3.1.3 SCTP_REMOTE_ERROR - * - * sre_flags: 16 bits (unsigned integer) - * Currently unused. - */ sre->sre_flags = 0; - - /* Socket Extensions for SCTP - * 5.3.1.3 SCTP_REMOTE_ERROR - * - * sre_length: sizeof (__u32) - * - * This field is the total length of the notification data, - * including the notification header. - */ sre->sre_length = skb->len; - - /* Socket Extensions for SCTP - * 5.3.1.3 SCTP_REMOTE_ERROR - * - * sre_error: 16 bits (unsigned integer) - * This value represents one of the Operational Error causes defined in - * the SCTP specification, in network byte order. - */ sre->sre_error = cause; - - /* Socket Extensions for SCTP - * 5.3.1.3 SCTP_REMOTE_ERROR - * - * sre_assoc_id: sizeof (sctp_assoc_t) - * - * The association id field, holds the identifier for the association. - * All notifications for a given association have the same association - * identifier. For TCP style socket, this field is ignored. - */ sctp_ulpevent_set_owner(event, asoc); sre->sre_assoc_id = sctp_assoc2id(asoc); return event; - fail: return NULL; } @@ -899,7 +858,9 @@ __u16 sctp_ulpevent_get_notification_type(const struct sctp_ulpevent *event) return notification->sn_header.sn_type; } -/* Copy out the sndrcvinfo into a msghdr. */ +/* RFC6458, Section 5.3.2. SCTP Header Information Structure + * (SCTP_SNDRCV, DEPRECATED) + */ void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event, struct msghdr *msghdr) { @@ -908,74 +869,21 @@ void sctp_ulpevent_read_sndrcvinfo(const struct sctp_ulpevent *event, if (sctp_ulpevent_is_notification(event)) return; - /* Sockets API Extensions for SCTP - * Section 5.2.2 SCTP Header Information Structure (SCTP_SNDRCV) - * - * sinfo_stream: 16 bits (unsigned integer) - * - * For recvmsg() the SCTP stack places the message's stream number in - * this value. - */ + memset(&sinfo, 0, sizeof(sinfo)); sinfo.sinfo_stream = event->stream; - /* sinfo_ssn: 16 bits (unsigned integer) - * - * For recvmsg() this value contains the stream sequence number that - * the remote endpoint placed in the DATA chunk. For fragmented - * messages this is the same number for all deliveries of the message - * (if more than one recvmsg() is needed to read the message). - */ sinfo.sinfo_ssn = event->ssn; - /* sinfo_ppid: 32 bits (unsigned integer) - * - * In recvmsg() this value is - * the same information that was passed by the upper layer in the peer - * application. Please note that byte order issues are NOT accounted - * for and this information is passed opaquely by the SCTP stack from - * one end to the other. - */ sinfo.sinfo_ppid = event->ppid; - /* sinfo_flags: 16 bits (unsigned integer) - * - * This field may contain any of the following flags and is composed of - * a bitwise OR of these values. - * - * recvmsg() flags: - * - * SCTP_UNORDERED - This flag is present when the message was sent - * non-ordered. - */ sinfo.sinfo_flags = event->flags; - /* sinfo_tsn: 32 bit (unsigned integer) - * - * For the receiving side, this field holds a TSN that was - * assigned to one of the SCTP Data Chunks. - */ sinfo.sinfo_tsn = event->tsn; - /* sinfo_cumtsn: 32 bit (unsigned integer) - * - * This field will hold the current cumulative TSN as - * known by the underlying SCTP layer. Note this field is - * ignored when sending and only valid for a receive - * operation when sinfo_flags are set to SCTP_UNORDERED. - */ sinfo.sinfo_cumtsn = event->cumtsn; - /* sinfo_assoc_id: sizeof (sctp_assoc_t) - * - * The association handle field, sinfo_assoc_id, holds the identifier - * for the association announced in the COMMUNICATION_UP notification. - * All notifications for a given association have the same identifier. - * Ignored for one-to-one style sockets. - */ sinfo.sinfo_assoc_id = sctp_assoc2id(event->asoc); - - /* context value that is set via SCTP_CONTEXT socket option. */ + /* Context value that is set via SCTP_CONTEXT socket option. */ sinfo.sinfo_context = event->asoc->default_rcv_context; - /* These fields are not used while receiving. */ sinfo.sinfo_timetolive = 0; put_cmsg(msghdr, IPPROTO_SCTP, SCTP_SNDRCV, - sizeof(struct sctp_sndrcvinfo), (void *)&sinfo); + sizeof(sinfo), &sinfo); } /* Do accounting for bytes received and hold a reference to the association From 03e01349c654fbdea80d3d9b4ab599244eb55bb7 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Tue, 15 Jul 2014 07:28:41 +1000 Subject: [PATCH 198/455] xfs: null unused quota inodes when quota is on When quota is on, it is expected that unused quota inodes have a value of NULLFSINO. The changes to support a separate project quota in 3.12 broken this rule for non-project quota inode enabled filesystem, as the code now refuses to write the group quota inode if neither group or project quotas are enabled. This regression was introduced by commit d892d58 ("xfs: Start using pquotaino from the superblock"). In this case, we should be writing NULLFSINO rather than nothing to ensure that we leave the group quota inode in a valid state while quotas are enabled. Failure to do so doesn't cause a current kernel to break - the separate project quota inodes introduced translation code to always treat a zero inode as NULLFSINO. This was introduced by commit 0102629 ("xfs: Initialize all quota inodes to be NULLFSINO") with is also in 3.12 but older kernels do not do this and hence taking a filesystem back to an older kernel can result in quotas failing initialisation at mount time. When that happens, we see this in dmesg: [ 1649.215390] XFS (sdb): Mounting Filesystem [ 1649.316894] XFS (sdb): Failed to initialize disk quotas. [ 1649.316902] XFS (sdb): Ending clean mount By ensuring that we write NULLFSINO to quota inodes that aren't active, we avoid this problem. We have to be really careful when determining if the quota inodes are active or not, because we don't want to write a NULLFSINO if the quota inodes are active and we simply aren't updating them. Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_sb.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_sb.c b/fs/xfs/xfs_sb.c index c3453b11f563..7703fa6770ff 100644 --- a/fs/xfs/xfs_sb.c +++ b/fs/xfs/xfs_sb.c @@ -483,10 +483,16 @@ xfs_sb_quota_to_disk( } /* - * GQUOTINO and PQUOTINO cannot be used together in versions - * of superblock that do not have pquotino. from->sb_flags - * tells us which quota is active and should be copied to - * disk. + * GQUOTINO and PQUOTINO cannot be used together in versions of + * superblock that do not have pquotino. from->sb_flags tells us which + * quota is active and should be copied to disk. If neither are active, + * make sure we write NULLFSINO to the sb_gquotino field as a quota + * inode value of "0" is invalid when the XFS_SB_VERSION_QUOTA feature + * bit is set. + * + * Note that we don't need to handle the sb_uquotino or sb_pquotino here + * as they do not require any translation. Hence the main sb field loop + * will write them appropriately from the in-core superblock. */ if ((*fields & XFS_SB_GQUOTINO) && (from->sb_qflags & XFS_GQUOTA_ACCT)) @@ -494,6 +500,17 @@ xfs_sb_quota_to_disk( else if ((*fields & XFS_SB_PQUOTINO) && (from->sb_qflags & XFS_PQUOTA_ACCT)) to->sb_gquotino = cpu_to_be64(from->sb_pquotino); + else { + /* + * We can't rely on just the fields being logged to tell us + * that it is safe to write NULLFSINO - we should only do that + * if quotas are not actually enabled. Hence only write + * NULLFSINO if both in-core quota inodes are NULL. + */ + if (from->sb_gquotino == NULLFSINO && + from->sb_pquotino == NULLFSINO) + to->sb_gquotino = cpu_to_be64(NULLFSINO); + } *fields &= ~(XFS_SB_PQUOTINO | XFS_SB_GQUOTINO); } From 9ecf07a1d8f70f72ec99a0f102c8aa24609d84f4 Mon Sep 17 00:00:00 2001 From: Mathias Krause Date: Sat, 12 Jul 2014 22:36:44 +0200 Subject: [PATCH 199/455] neigh: sysctl - simplify address calculation of gc_* variables The code in neigh_sysctl_register() relies on a specific layout of struct neigh_table, namely that the 'gc_*' variables are directly following the 'parms' member in a specific order. The code, though, expresses this in the most ugly way. Get rid of the ugly casts and use the 'tbl' pointer to get a handle to the table. This way we can refer to the 'gc_*' variables directly. Similarly seen in the grsecurity patch, written by Brad Spengler. Signed-off-by: Mathias Krause Cc: Brad Spengler Signed-off-by: David S. Miller --- include/net/neighbour.h | 1 - net/core/neighbour.c | 9 +++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 7277caf3743d..47f425464f84 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -203,7 +203,6 @@ struct neigh_table { void (*proxy_redo)(struct sk_buff *skb); char *id; struct neigh_parms parms; - /* HACK. gc_* should follow parms without a gap! */ int gc_interval; int gc_thresh1; int gc_thresh2; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 32d872eec7f5..559890b0f0a2 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -3059,11 +3059,12 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p, memset(&t->neigh_vars[NEIGH_VAR_GC_INTERVAL], 0, sizeof(t->neigh_vars[NEIGH_VAR_GC_INTERVAL])); } else { + struct neigh_table *tbl = p->tbl; dev_name_source = "default"; - t->neigh_vars[NEIGH_VAR_GC_INTERVAL].data = (int *)(p + 1); - t->neigh_vars[NEIGH_VAR_GC_THRESH1].data = (int *)(p + 1) + 1; - t->neigh_vars[NEIGH_VAR_GC_THRESH2].data = (int *)(p + 1) + 2; - t->neigh_vars[NEIGH_VAR_GC_THRESH3].data = (int *)(p + 1) + 3; + t->neigh_vars[NEIGH_VAR_GC_INTERVAL].data = &tbl->gc_interval; + t->neigh_vars[NEIGH_VAR_GC_THRESH1].data = &tbl->gc_thresh1; + t->neigh_vars[NEIGH_VAR_GC_THRESH2].data = &tbl->gc_thresh2; + t->neigh_vars[NEIGH_VAR_GC_THRESH3].data = &tbl->gc_thresh3; } if (handler) { From a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711 Mon Sep 17 00:00:00 2001 From: Christoph Schulz Date: Sun, 13 Jul 2014 00:53:15 +0200 Subject: [PATCH 200/455] net: pppoe: use correct channel MTU when using Multilink PPP The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see ppp_generic module) tries to determine how big a fragment might be. According to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the corresponding comment and code in ppp_mp_explode(): /* * hdrlen includes the 2-byte PPP protocol field, but the * MTU counts only the payload excluding the protocol field. * (RFC1661 Section 2) */ mtu = pch->chan->mtu - (hdrlen - 2); However, the pppoe module *does* include the PPP protocol field in the channel MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under certain circumstances (one byte if PPP protocol compression is used, two otherwise), causing the generated Ethernet packets to be dropped. So the pppoe module has to subtract two bytes from the channel MTU. This error only manifests itself when using Multilink PPP, as otherwise the channel MTU is not used anywhere. In the following, I will describe how to reproduce this bug. We configure two pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is computed by adding the two link MTUs and subtracting the MP header twice, which is 4 bytes long.) The necessary pppd statements on both sides are "multilink mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server side, we additionally need to start two pppoe-server instances to be able to establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU of the PPP network interface to the MRRU (2976) on both sides of the connection in order to make use of the higher bandwidth. (If we didn't do that, IP fragmentation would kick in, which we want to avoid.) Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to server over the PPP link. This results in the following network packet: 2948 (echo payload) + 8 (ICMPv4 header) + 20 (IPv4 header) --------------------- 2976 (PPP payload) These 2976 bytes do not exceed the MTU of the PPP network interface, so the IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode() prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger than the negotiated MRRU. So this packet would have to be divided in three fragments. But this does not happen as each link MTU is assumed to be two bytes larger. So this packet is diveded into two fragments only, one of size 1489 and one of size 1488. Now we have for that bigger fragment: 1489 (PPP payload) + 4 (MP header) + 2 (PPP protocol field for the MP payload (0x3d)) + 6 (PPPoE header) -------------------------- 1501 (Ethernet payload) This packet exceeds the link MTU and is discarded. If one configures the link MTU on the client side to 1501, one can see the discarded Ethernet frames with tcpdump running on the client. A ping -s 2948 -c 1 192.168.15.254 leads to the smaller fragment that is correctly received on the server side: (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d) 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000, Flags [end], length 1492 and to the bigger fragment that is not received on the server side: (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d) 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000, Flags [begin], length 1493 With the patch below, we correctly obtain three fragments: 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [begin], length 1492 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [none], length 1492 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000, Flags [end], length 5 And the ICMPv4 echo request is successfully received at the server side: IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1), length 2976) 192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0, length 2956 The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698 ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies to 3.10 upwards but the fix can be applied (with minor modifications) to kernels as old as 2.6.32. Signed-off-by: Christoph Schulz Signed-off-by: David S. Miller --- drivers/net/ppp/pppoe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index 2ea7efd11857..6c9c16d76935 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -675,7 +675,7 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, po->chan.hdrlen = (sizeof(struct pppoe_hdr) + dev->hard_header_len); - po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr); + po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr) - 2; po->chan.private = sk; po->chan.ops = &pppoe_chan_ops; From 548d28bd0eac840d122b691279ce9f4ce6ecbfb6 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Sun, 13 Jul 2014 09:47:47 +0200 Subject: [PATCH 201/455] bonding: fix ad_select module param check Obvious copy/paste error when I converted the ad_select to the new option API. "lacp_rate" there should be "ad_select" so we can get the proper value. CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek CC: David S. Miller Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option API") Reported-by: Karim Scheik Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- drivers/net/bonding/bond_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3a451b6cd3d5..701f86cd5993 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4068,7 +4068,7 @@ static int bond_check_params(struct bond_params *params) } if (ad_select) { - bond_opt_initstr(&newval, lacp_rate); + bond_opt_initstr(&newval, ad_select); valptr = bond_opt_parse(bond_opt_get(BOND_OPT_AD_SELECT), &newval); if (!valptr) { From 32b333fe99069d090a12cc106fd5ae1002fe642f Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Mon, 14 Jul 2014 11:42:44 +0800 Subject: [PATCH 202/455] mlx4: mark napi id for gro_skb Napi id was not marked for gro_skb, this will lead rx busy loop won't work correctly since they stack never try to call low latency receive method because of a zero socket napi id. Fix this by marking napi id for gro_skb. The transaction rate of 1 byte netperf tcp_rr gets about 50% increased (from 20531.68 to 30610.88). Cc: Amir Vadai Signed-off-by: Jason Wang Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 96724170308a..5535862f27cc 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -783,6 +783,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud PKT_HASH_TYPE_L3); skb_record_rx_queue(gro_skb, cq->ring); + skb_mark_napi_id(gro_skb, &cq->napi); if (ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL) { timestamp = mlx4_en_get_cqe_ts(cqe); From 3916a3192793fd3c11f69d623ef0cdbdbf9ea10a Mon Sep 17 00:00:00 2001 From: Christoph Schulz Date: Mon, 14 Jul 2014 08:01:10 +0200 Subject: [PATCH 203/455] net: ppp: don't call sk_chk_filter twice Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") causes sk_chk_filter() to be called twice when setting a PPP pass or active filter. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from within get_filter(). The second one is through the call chain ppp_ioctl() or isdn_ppp_ioctl() --> sk_unattached_filter_create() --> __sk_prepare_filter() --> sk_chk_filter() The first call from within get_filter() should be deleted as get_filter() is called just before calling sk_unattached_filter_create() later on, which eventually calls sk_chk_filter() anyway. For 3.15.x, this proposed change is a bugfix rather than a pure optimization as in that branch, sk_chk_filter() may replace filter codes by other codes which are not recognized when executing sk_chk_filter() a second time. So with 3.15.x, if sk_chk_filter() is called twice, the second invocation may yield EINVAL (this depends on the filter codes found in the filter to be set, but because the replacement is done for frequently used codes, this is almost always the case). The net effect is that setting pass and/or active PPP filters does not work anymore, since sk_unattached_filter_create() always returns EINVAL due to the second call to sk_chk_filter(), regardless whether the filter was originally sane or not. Signed-off-by: Christoph Schulz Acked-by: Daniel Borkmann Signed-off-by: David S. Miller --- drivers/isdn/i4l/isdn_ppp.c | 8 +------- drivers/net/ppp/ppp_generic.c | 8 +------- 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c index 61ac63237446..a333b7f798d1 100644 --- a/drivers/isdn/i4l/isdn_ppp.c +++ b/drivers/isdn/i4l/isdn_ppp.c @@ -442,7 +442,7 @@ static int get_filter(void __user *arg, struct sock_filter **p) { struct sock_fprog uprog; struct sock_filter *code = NULL; - int len, err; + int len; if (copy_from_user(&uprog, arg, sizeof(uprog))) return -EFAULT; @@ -458,12 +458,6 @@ static int get_filter(void __user *arg, struct sock_filter **p) if (IS_ERR(code)) return PTR_ERR(code); - err = sk_chk_filter(code, uprog.len); - if (err) { - kfree(code); - return err; - } - *p = code; return uprog.len; } diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 91d6c1272fcf..e2f20f807de8 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -539,7 +539,7 @@ static int get_filter(void __user *arg, struct sock_filter **p) { struct sock_fprog uprog; struct sock_filter *code = NULL; - int len, err; + int len; if (copy_from_user(&uprog, arg, sizeof(uprog))) return -EFAULT; @@ -554,12 +554,6 @@ static int get_filter(void __user *arg, struct sock_filter **p) if (IS_ERR(code)) return PTR_ERR(code); - err = sk_chk_filter(code, uprog.len); - if (err) { - kfree(code); - return err; - } - *p = code; return uprog.len; } From 3cf521f7dc87c031617fd47e4b7aa2593c2f3daf Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Mon, 14 Jul 2014 17:02:31 -0700 Subject: [PATCH 204/455] net/l2tp: don't fall back on UDP [get|set]sockopt The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket. As David Miller points out: "If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be" Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL. Reported-by: Sasha Levin Acked-by: James Chapman Acked-by: David Miller Cc: Phil Turnbull Cc: Vegard Nossum Cc: Willy Tarreau Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds --- net/l2tp/l2tp_ppp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 950909f04ee6..13752d96275e 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -1365,7 +1365,7 @@ static int pppol2tp_setsockopt(struct socket *sock, int level, int optname, int err; if (level != SOL_PPPOL2TP) - return udp_prot.setsockopt(sk, level, optname, optval, optlen); + return -EINVAL; if (optlen < sizeof(int)) return -EINVAL; @@ -1491,7 +1491,7 @@ static int pppol2tp_getsockopt(struct socket *sock, int level, int optname, struct pppol2tp_session *ps; if (level != SOL_PPPOL2TP) - return udp_prot.getsockopt(sk, level, optname, optval, optlen); + return -EINVAL; if (get_user(len, optlen)) return -EFAULT; From db4175ae2095634dbecd4c847da439f9c83e1b3b Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Fri, 4 Jul 2014 05:44:39 -0300 Subject: [PATCH 205/455] [media] tda10071: force modulation to QPSK on DVB-S Only supported modulation for DVB-S is QPSK. Modulation parameter contains invalid value for DVB-S on some cases, which leads driver refusing tuning attempt. Due to that, hard code modulation to QPSK in case of DVB-S. Cc: stable@vger.kernel.org Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/tda10071.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/media/dvb-frontends/tda10071.c b/drivers/media/dvb-frontends/tda10071.c index 522fe00f5eee..49874e76548b 100644 --- a/drivers/media/dvb-frontends/tda10071.c +++ b/drivers/media/dvb-frontends/tda10071.c @@ -668,6 +668,7 @@ static int tda10071_set_frontend(struct dvb_frontend *fe) struct dtv_frontend_properties *c = &fe->dtv_property_cache; int ret, i; u8 mode, rolloff, pilot, inversion, div; + fe_modulation_t modulation; dev_dbg(&priv->i2c->dev, "%s: delivery_system=%d modulation=%d frequency=%d symbol_rate=%d inversion=%d pilot=%d rolloff=%d\n", @@ -702,10 +703,13 @@ static int tda10071_set_frontend(struct dvb_frontend *fe) switch (c->delivery_system) { case SYS_DVBS: + modulation = QPSK; rolloff = 0; pilot = 2; break; case SYS_DVBS2: + modulation = c->modulation; + switch (c->rolloff) { case ROLLOFF_20: rolloff = 2; @@ -750,7 +754,7 @@ static int tda10071_set_frontend(struct dvb_frontend *fe) for (i = 0, mode = 0xff; i < ARRAY_SIZE(TDA10071_MODCOD); i++) { if (c->delivery_system == TDA10071_MODCOD[i].delivery_system && - c->modulation == TDA10071_MODCOD[i].modulation && + modulation == TDA10071_MODCOD[i].modulation && c->fec_inner == TDA10071_MODCOD[i].fec) { mode = TDA10071_MODCOD[i].val; dev_dbg(&priv->i2c->dev, "%s: mode found=%02x\n", From bc760cdae9b67e689ed29c66c9c2d78d6f5f8c4b Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Mon, 7 Jul 2014 09:05:15 -0300 Subject: [PATCH 206/455] [media] tda10071: add missing DVB-S2/PSK-8 FEC AUTO FEC AUTO is valid for PSK-8 modulation too. Add it. Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/tda10071_priv.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/media/dvb-frontends/tda10071_priv.h b/drivers/media/dvb-frontends/tda10071_priv.h index 4baf14bfb65a..420486192736 100644 --- a/drivers/media/dvb-frontends/tda10071_priv.h +++ b/drivers/media/dvb-frontends/tda10071_priv.h @@ -55,6 +55,7 @@ static struct tda10071_modcod { { SYS_DVBS2, QPSK, FEC_8_9, 0x0a }, { SYS_DVBS2, QPSK, FEC_9_10, 0x0b }, /* 8PSK */ + { SYS_DVBS2, PSK_8, FEC_AUTO, 0x00 }, { SYS_DVBS2, PSK_8, FEC_3_5, 0x0c }, { SYS_DVBS2, PSK_8, FEC_2_3, 0x0d }, { SYS_DVBS2, PSK_8, FEC_3_4, 0x0e }, From b32725e84c02b4d01472770b96d1b33737b23b6d Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Mon, 7 Jul 2014 09:52:28 -0300 Subject: [PATCH 207/455] [media] tda10071: fix spec inversion reporting Inversion ON was reported as inversion OFF and vice versa. Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/tda10071.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/media/dvb-frontends/tda10071.c b/drivers/media/dvb-frontends/tda10071.c index 49874e76548b..d590798a6367 100644 --- a/drivers/media/dvb-frontends/tda10071.c +++ b/drivers/media/dvb-frontends/tda10071.c @@ -838,10 +838,10 @@ static int tda10071_get_frontend(struct dvb_frontend *fe) switch ((buf[1] >> 0) & 0x01) { case 0: - c->inversion = INVERSION_OFF; + c->inversion = INVERSION_ON; break; case 1: - c->inversion = INVERSION_ON; + c->inversion = INVERSION_OFF; break; } From c2c1a6e5851fe354d39e5b7907c6c9d0a997ec16 Mon Sep 17 00:00:00 2001 From: Antti Palosaari Date: Tue, 8 Jul 2014 02:48:28 -0300 Subject: [PATCH 208/455] [media] tda10071: fix returned symbol rate calculation Detected symbol rate value was returned too small. Signed-off-by: Antti Palosaari Signed-off-by: Mauro Carvalho Chehab --- drivers/media/dvb-frontends/tda10071.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/dvb-frontends/tda10071.c b/drivers/media/dvb-frontends/tda10071.c index d590798a6367..9619be5d4827 100644 --- a/drivers/media/dvb-frontends/tda10071.c +++ b/drivers/media/dvb-frontends/tda10071.c @@ -860,7 +860,7 @@ static int tda10071_get_frontend(struct dvb_frontend *fe) if (ret) goto error; - c->symbol_rate = (buf[0] << 16) | (buf[1] << 8) | (buf[2] << 0); + c->symbol_rate = ((buf[0] << 16) | (buf[1] << 8) | (buf[2] << 0)) * 1000; return ret; error: From 242841d3d71191348f98310e2d2001e1001d8630 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 9 Jul 2014 06:20:44 -0300 Subject: [PATCH 209/455] [media] gspca_pac7302: Add new usb-id for Genius i-Look 317 Tested-and-reported-by: yullaw Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede Signed-off-by: Mauro Carvalho Chehab --- drivers/media/usb/gspca/pac7302.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/media/usb/gspca/pac7302.c b/drivers/media/usb/gspca/pac7302.c index 2fd1c5e31a0f..339adce7c7a5 100644 --- a/drivers/media/usb/gspca/pac7302.c +++ b/drivers/media/usb/gspca/pac7302.c @@ -928,6 +928,7 @@ static const struct usb_device_id device_table[] = { {USB_DEVICE(0x093a, 0x2620)}, {USB_DEVICE(0x093a, 0x2621)}, {USB_DEVICE(0x093a, 0x2622), .driver_info = FL_VFLIP}, + {USB_DEVICE(0x093a, 0x2623), .driver_info = FL_VFLIP}, {USB_DEVICE(0x093a, 0x2624), .driver_info = FL_VFLIP}, {USB_DEVICE(0x093a, 0x2625)}, {USB_DEVICE(0x093a, 0x2626)}, From 8c947e20cb1f442c704852b2ca24b81981b09493 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Wed, 9 Jul 2014 09:48:06 -0700 Subject: [PATCH 210/455] Input: i8042 - add Acer Aspire 5710 to nomux blacklist Acer Aspire needs to be added to nomux blacklist, otherwise the touchpad misbehaves rather randomly. Signed-off-by: Jiri Kosina Signed-off-by: Dmitry Torokhov --- drivers/input/serio/i8042-x86ia64io.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index 381b20d4c561..136b7b204f56 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -401,6 +401,13 @@ static const struct dmi_system_id __initconst i8042_dmi_nomux_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 1360"), }, }, + { + /* Acer Aspire 5710 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5710"), + }, + }, { /* Gericom Bellagio */ .matches = { From e76aed9da7189eeb41b9856552ce5721181e8e8d Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 14 Jul 2014 17:12:21 -0700 Subject: [PATCH 211/455] Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531) https://bugzilla.redhat.com/show_bug.cgi?id=1114768 Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede Signed-off-by: Dmitry Torokhov --- drivers/input/mouse/synaptics.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c index ec772d962f06..ef9e0b8a9aa7 100644 --- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -132,7 +132,8 @@ static const struct min_max_quirk min_max_pnpid_table[] = { 1232, 5710, 1156, 4696 }, { - (const char * const []){"LEN0034", "LEN0036", "LEN2004", NULL}, + (const char * const []){"LEN0034", "LEN0036", "LEN2002", + "LEN2004", NULL}, 1024, 5112, 2024, 4832 }, { @@ -168,7 +169,7 @@ static const char * const topbuttonpad_pnp_ids[] = { "LEN0049", "LEN2000", "LEN2001", /* Edge E431 */ - "LEN2002", + "LEN2002", /* Edge E531 */ "LEN2003", "LEN2004", /* L440 */ "LEN2005", From 5c763edfe4879ffc3a87fef64f743d4b5497aabb Mon Sep 17 00:00:00 2001 From: Olivier Sobrie Date: Mon, 14 Jul 2014 12:08:49 +0200 Subject: [PATCH 212/455] hso: remove unused workqueue The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere in the code. So, remove it. Signed-off-by: Olivier Sobrie Signed-off-by: David S. Miller --- drivers/net/usb/hso.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c index a3a05869309d..9ca2b418a3ee 100644 --- a/drivers/net/usb/hso.c +++ b/drivers/net/usb/hso.c @@ -261,7 +261,6 @@ struct hso_serial { u16 curr_rx_urb_offset; u8 rx_urb_filled[MAX_RX_URBS]; struct tasklet_struct unthrottle_tasklet; - struct work_struct retry_unthrottle_workqueue; }; struct hso_device { @@ -1252,14 +1251,6 @@ static void hso_unthrottle(struct tty_struct *tty) tasklet_hi_schedule(&serial->unthrottle_tasklet); } -static void hso_unthrottle_workfunc(struct work_struct *work) -{ - struct hso_serial *serial = - container_of(work, struct hso_serial, - retry_unthrottle_workqueue); - hso_unthrottle_tasklet(serial); -} - /* open the requested serial port */ static int hso_serial_open(struct tty_struct *tty, struct file *filp) { @@ -1295,8 +1286,6 @@ static int hso_serial_open(struct tty_struct *tty, struct file *filp) tasklet_init(&serial->unthrottle_tasklet, (void (*)(unsigned long))hso_unthrottle_tasklet, (unsigned long)serial); - INIT_WORK(&serial->retry_unthrottle_workqueue, - hso_unthrottle_workfunc); result = hso_start_serial_device(serial->parent, GFP_KERNEL); if (result) { hso_stop_serial_device(serial->parent); @@ -1345,7 +1334,6 @@ static void hso_serial_close(struct tty_struct *tty, struct file *filp) if (!usb_gone) hso_stop_serial_device(serial->parent); tasklet_kill(&serial->unthrottle_tasklet); - cancel_work_sync(&serial->retry_unthrottle_workqueue); } if (!usb_gone) From 8f9818af4eaef1150282e18355aaea425474a411 Mon Sep 17 00:00:00 2001 From: Olivier Sobrie Date: Mon, 14 Jul 2014 12:08:50 +0200 Subject: [PATCH 213/455] hso: fix deadlock when receiving bursts of data When the module sends bursts of data, sometimes a deadlock happens in the hso driver when the tty buffer doesn't get the chance to be flushed quickly enough. Remove the endless while loop in function put_rxbuf_data() which is called by the urb completion handler. If there isn't enough room in the tty buffer, discards all the data received in the URB. Cc: David Miller Cc: David Laight Cc: One Thousand Gnomes Cc: Dan Williams Cc: Jan Dumon Signed-off-by: Olivier Sobrie Acked-by: Alan Cox Signed-off-by: David S. Miller --- drivers/net/usb/hso.c | 44 ++++++++++++++++++++----------------------- 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c index 9ca2b418a3ee..a4272ed62da8 100644 --- a/drivers/net/usb/hso.c +++ b/drivers/net/usb/hso.c @@ -258,7 +258,6 @@ struct hso_serial { * so as not to drop characters on the floor. */ int curr_rx_urb_idx; - u16 curr_rx_urb_offset; u8 rx_urb_filled[MAX_RX_URBS]; struct tasklet_struct unthrottle_tasklet; }; @@ -2001,8 +2000,7 @@ static void ctrl_callback(struct urb *urb) static int put_rxbuf_data(struct urb *urb, struct hso_serial *serial) { struct tty_struct *tty; - int write_length_remaining = 0; - int curr_write_len; + int count; /* Sanity check */ if (urb == NULL || serial == NULL) { @@ -2012,29 +2010,28 @@ static int put_rxbuf_data(struct urb *urb, struct hso_serial *serial) tty = tty_port_tty_get(&serial->port); - /* Push data to tty */ - write_length_remaining = urb->actual_length - - serial->curr_rx_urb_offset; - D1("data to push to tty"); - while (write_length_remaining) { - if (tty && test_bit(TTY_THROTTLED, &tty->flags)) { - tty_kref_put(tty); - return -1; - } - curr_write_len = tty_insert_flip_string(&serial->port, - urb->transfer_buffer + serial->curr_rx_urb_offset, - write_length_remaining); - serial->curr_rx_urb_offset += curr_write_len; - write_length_remaining -= curr_write_len; - tty_flip_buffer_push(&serial->port); + if (tty && test_bit(TTY_THROTTLED, &tty->flags)) { + tty_kref_put(tty); + return -1; } + + /* Push data to tty */ + D1("data to push to tty"); + count = tty_buffer_request_room(&serial->port, urb->actual_length); + if (count >= urb->actual_length) { + tty_insert_flip_string(&serial->port, urb->transfer_buffer, + urb->actual_length); + tty_flip_buffer_push(&serial->port); + } else { + dev_warn(&serial->parent->usb->dev, + "dropping data, %d bytes lost\n", urb->actual_length); + } + tty_kref_put(tty); - if (write_length_remaining == 0) { - serial->curr_rx_urb_offset = 0; - serial->rx_urb_filled[hso_urb_to_index(serial, urb)] = 0; - } - return write_length_remaining; + serial->rx_urb_filled[hso_urb_to_index(serial, urb)] = 0; + + return 0; } @@ -2205,7 +2202,6 @@ static int hso_stop_serial_device(struct hso_device *hso_dev) } } serial->curr_rx_urb_idx = 0; - serial->curr_rx_urb_offset = 0; if (serial->tx_urb) usb_kill_urb(serial->tx_urb); From bb78e7a12aa4898901d44bcb9d418f4188594427 Mon Sep 17 00:00:00 2001 From: Martin Peres Date: Mon, 14 Jul 2014 11:12:56 +0200 Subject: [PATCH 214/455] drm/nouveau/therm: fix a potential deadlock in the therm monitoring code Signed-off-by: Martin Peres Tested-by: Stefan Ringel Signed-off-by: Ben Skeggs --- drivers/gpu/drm/nouveau/core/subdev/therm/temp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c index cfde9eb44ad0..6212537b90c5 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c @@ -192,11 +192,11 @@ alarm_timer_callback(struct nouveau_alarm *alarm) nouveau_therm_threshold_hyst_polling(therm, &sensor->thrs_shutdown, NOUVEAU_THERM_THRS_SHUTDOWN); + spin_unlock_irqrestore(&priv->sensor.alarm_program_lock, flags); + /* schedule the next poll in one second */ if (therm->temp_get(therm) >= 0 && list_empty(&alarm->head)) - ptimer->alarm(ptimer, 1000 * 1000 * 1000, alarm); - - spin_unlock_irqrestore(&priv->sensor.alarm_program_lock, flags); + ptimer->alarm(ptimer, 1000000000ULL, alarm); } void From 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 15 Jul 2014 08:51:27 +0200 Subject: [PATCH 215/455] PM / sleep: Fix request_firmware() error at resume The commit [247bc037: PM / Sleep: Mitigate race between the freezer and request_firmware()] introduced the finer state control, but it also leads to a new bug; for example, a bug report regarding the firmware loading of intel BT device at suspend/resume: https://bugzilla.novell.com/show_bug.cgi?id=873790 The root cause seems to be a small window between the process resume and the clear of usermodehelper lock. The request_firmware() function checks the UMH lock and gives up when it's in UMH_DISABLE state. This is for avoiding the invalid f/w loading during suspend/resume phase. The problem is, however, that usermodehelper_enable() is called at the end of thaw_processes(). Thus, a thawed process in between can kick off the f/w loader code path (in this case, via btusb_setup_intel()) even before the call of usermodehelper_enable(). Then usermodehelper_read_trylock() returns an error and request_firmware() spews WARN_ON() in the end. This oneliner patch fixes the issue just by setting to UMH_FREEZING state again before restarting tasks, so that the call of request_firmware() will be blocked until the end of this function instead of returning an error. Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware()) Link: https://bugzilla.novell.com/show_bug.cgi?id=873790 Cc: 3.4+ # 3.4+ Signed-off-by: Takashi Iwai Signed-off-by: Rafael J. Wysocki --- kernel/power/process.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/power/process.c b/kernel/power/process.c index 0ca8d83e2369..4ee194eb524b 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -186,6 +186,7 @@ void thaw_processes(void) printk("Restarting tasks ... "); + __usermodehelper_set_disable_depth(UMH_FREEZING); thaw_workqueues(); read_lock(&tasklist_lock); From 653a3538f865d26350727df25397bee2bacde773 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Tue, 15 Jul 2014 14:26:13 +0200 Subject: [PATCH 216/455] PM / sleep: fix freeze_ops NULL pointer dereferences This patch fixes a NULL pointer dereference issue introduced by commit 1f0b63866fc1 (ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state). Fixes: 1f0b63866fc1 (ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state) Link: http://marc.info/?l=linux-pm&m=140541182017443&w=2 Reported-and-tested-by: Alexander Stein Signed-off-by: Zhang Rui Signed-off-by: Rafael J. Wysocki --- kernel/power/suspend.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 4dd8822f732a..ed35a4790afe 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -306,7 +306,7 @@ int suspend_devices_and_enter(suspend_state_t state) error = suspend_ops->begin(state); if (error) goto Close; - } else if (state == PM_SUSPEND_FREEZE && freeze_ops->begin) { + } else if (state == PM_SUSPEND_FREEZE && freeze_ops && freeze_ops->begin) { error = freeze_ops->begin(); if (error) goto Close; @@ -335,7 +335,7 @@ int suspend_devices_and_enter(suspend_state_t state) Close: if (need_suspend_ops(state) && suspend_ops->end) suspend_ops->end(); - else if (state == PM_SUSPEND_FREEZE && freeze_ops->end) + else if (state == PM_SUSPEND_FREEZE && freeze_ops && freeze_ops->end) freeze_ops->end(); return error; From 4da63c6fc426023d1a20e45508c47d7d68c6a53d Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 15 Jul 2014 15:19:43 +0200 Subject: [PATCH 217/455] ALSA: hda - Fix broken PM due to incomplete i915 initialization When the initialization of Intel HDMI controller fails due to missing i915 kernel symbols (e.g. HD-audio is built in while i915 is module), the driver discontinues the probe. However, since the probe was done asynchronously, the driver object still remains, thus the relevant PM ops are still called at suspend/resume. This results in the bad access to the incomplete audio card object, eventually leads to Oops or stall at PM. This patch adds the missing checks of chip->init_failed flag at each PM callback in order to fix the problem above. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561 Cc: Signed-off-by: Takashi Iwai --- sound/pci/hda/hda_intel.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index d690c26a197c..83cd19017cf3 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -596,7 +596,7 @@ static int azx_suspend(struct device *dev) struct azx *chip = card->private_data; struct azx_pcm *p; - if (chip->disabled) + if (chip->disabled || chip->init_failed) return 0; snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); @@ -628,7 +628,7 @@ static int azx_resume(struct device *dev) struct snd_card *card = dev_get_drvdata(dev); struct azx *chip = card->private_data; - if (chip->disabled) + if (chip->disabled || chip->init_failed) return 0; if (chip->driver_caps & AZX_DCAPS_I915_POWERWELL) { @@ -665,7 +665,7 @@ static int azx_runtime_suspend(struct device *dev) struct snd_card *card = dev_get_drvdata(dev); struct azx *chip = card->private_data; - if (chip->disabled) + if (chip->disabled || chip->init_failed) return 0; if (!(chip->driver_caps & AZX_DCAPS_PM_RUNTIME)) @@ -692,7 +692,7 @@ static int azx_runtime_resume(struct device *dev) struct hda_codec *codec; int status; - if (chip->disabled) + if (chip->disabled || chip->init_failed) return 0; if (!(chip->driver_caps & AZX_DCAPS_PM_RUNTIME)) @@ -729,7 +729,7 @@ static int azx_runtime_idle(struct device *dev) struct snd_card *card = dev_get_drvdata(dev); struct azx *chip = card->private_data; - if (chip->disabled) + if (chip->disabled || chip->init_failed) return 0; if (!power_save_controller || From 8abfb8727f4a724d31f9ccfd8013fbd16d539445 Mon Sep 17 00:00:00 2001 From: "zhangwei(Jovi)" Date: Thu, 18 Jul 2013 16:31:05 +0800 Subject: [PATCH 218/455] tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs Currently trace option stacktrace is not applicable for trace_printk with constant string argument, the reason is in __trace_puts/__trace_bputs ftrace_trace_stack is missing. In contrast, when using trace_printk with non constant string argument(will call into __trace_printk/__trace_bprintk), then trace option stacktrace is workable, this inconstant result will confuses users a lot. Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: zhangwei(Jovi) Signed-off-by: Steven Rostedt --- kernel/trace/trace.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index f243444a3772..a6ffc8918dda 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -466,6 +466,9 @@ int __trace_puts(unsigned long ip, const char *str, int size) struct print_entry *entry; unsigned long irq_flags; int alloc; + int pc; + + pc = preempt_count(); if (unlikely(tracing_selftest_running || tracing_disabled)) return 0; @@ -475,7 +478,7 @@ int __trace_puts(unsigned long ip, const char *str, int size) local_save_flags(irq_flags); buffer = global_trace.trace_buffer.buffer; event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, alloc, - irq_flags, preempt_count()); + irq_flags, pc); if (!event) return 0; @@ -492,6 +495,7 @@ int __trace_puts(unsigned long ip, const char *str, int size) entry->buf[size] = '\0'; __buffer_unlock_commit(buffer, event); + ftrace_trace_stack(buffer, irq_flags, 4, pc); return size; } @@ -509,6 +513,9 @@ int __trace_bputs(unsigned long ip, const char *str) struct bputs_entry *entry; unsigned long irq_flags; int size = sizeof(struct bputs_entry); + int pc; + + pc = preempt_count(); if (unlikely(tracing_selftest_running || tracing_disabled)) return 0; @@ -516,7 +523,7 @@ int __trace_bputs(unsigned long ip, const char *str) local_save_flags(irq_flags); buffer = global_trace.trace_buffer.buffer; event = trace_buffer_lock_reserve(buffer, TRACE_BPUTS, size, - irq_flags, preempt_count()); + irq_flags, pc); if (!event) return 0; @@ -525,6 +532,7 @@ int __trace_bputs(unsigned long ip, const char *str) entry->str = str; __buffer_unlock_commit(buffer, event); + ftrace_trace_stack(buffer, irq_flags, 4, pc); return 1; } From 5f8bf2d263a20b986225ae1ed7d6759dc4b93af9 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" Date: Tue, 15 Jul 2014 11:05:12 -0400 Subject: [PATCH 219/455] tracing: Fix graph tracer with stack tracer on other archs Running my ftrace tests on PowerPC, it failed the test that checks if function_graph tracer is affected by the stack tracer. It was. Looking into this, I found that the update_function_graph_func() must be called even if the trampoline function is not changed. This is because archs like PowerPC do not support ftrace_ops being passed by assembly and instead uses a helper function (what the trampoline function points to). Since this function is not changed even when multiple ftrace_ops are added to the code, the test that falls out before calling update_function_graph_func() will miss that the update must still be done. Call update_function_graph_function() for all calls to update_ftrace_function() Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Steven Rostedt --- kernel/trace/ftrace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 5b372e3ed675..ac9d1dad630b 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -265,12 +265,12 @@ static void update_ftrace_function(void) func = ftrace_ops_list_func; } + update_function_graph_func(); + /* If there's no change, then do nothing more here */ if (ftrace_trace_function == func) return; - update_function_graph_func(); - /* * If we are using the list function, it doesn't care * about the function_trace_ops. From eec7e1c16d2b65e38137686dd9b7e102c2150905 Mon Sep 17 00:00:00 2001 From: Alexey Asemov Date: Tue, 15 Jul 2014 10:28:42 +0400 Subject: [PATCH 220/455] libata: EH should handle AMNF error condition as a media error libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH. While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure AMNF 0x01 error code generates generic "device error" that is retried several times by SCSI stack instead of "media error" that is passed up to software. So we may assume deprecated AMNF error code is surely not dead yet, and it's better for it to be handled properly. As we may see it is used by modern enough devices, and used properly: drive returned AMNF only when IDs for track cannot be read completely due to dying head or positioning, otherwise it returned UNC(orrectables). Not handling it causes wrong generic error code ("device error") reporting down the stack, can damage failing drives further because of excessive retries, and slows salvaging down a lot. Also, there is handling code in libata-scsi.c for 0x01 AMNF error already. https://bugzilla.kernel.org/show_bug.cgi?id=80031 tj: Shortened $SUBJ and moved its content to the first paragraph. Signed-off-by: Alexey Asemov Signed-off-by: Tejun Heo --- drivers/ata/libata-eh.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 6760fc4e85b8..dad83df555c4 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -1811,7 +1811,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc, case ATA_DEV_ATA: if (err & ATA_ICRC) qc->err_mask |= AC_ERR_ATA_BUS; - if (err & ATA_UNC) + if (err & (ATA_UNC | ATA_AMNF)) qc->err_mask |= AC_ERR_MEDIA; if (err & ATA_IDNF) qc->err_mask |= AC_ERR_INVALID; @@ -2556,11 +2556,12 @@ static void ata_eh_link_report(struct ata_link *link) } if (cmd->command != ATA_CMD_PACKET && - (res->feature & (ATA_ICRC | ATA_UNC | ATA_IDNF | - ATA_ABORTED))) - ata_dev_err(qc->dev, "error: { %s%s%s%s}\n", + (res->feature & (ATA_ICRC | ATA_UNC | ATA_AMNF | + ATA_IDNF | ATA_ABORTED))) + ata_dev_err(qc->dev, "error: { %s%s%s%s%s}\n", res->feature & ATA_ICRC ? "ICRC " : "", res->feature & ATA_UNC ? "UNC " : "", + res->feature & ATA_AMNF ? "AMNF " : "", res->feature & ATA_IDNF ? "IDNF " : "", res->feature & ATA_ABORTED ? "ABRT " : ""); #endif From f0160a5a2912267c02cfe692eac955c360de5fdf Mon Sep 17 00:00:00 2001 From: "zhangwei(Jovi)" Date: Thu, 18 Jul 2013 16:31:18 +0800 Subject: [PATCH 221/455] tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: zhangwei(Jovi) Signed-off-by: Steven Rostedt --- kernel/trace/trace.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a6ffc8918dda..bda9621638cc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -468,6 +468,9 @@ int __trace_puts(unsigned long ip, const char *str, int size) int alloc; int pc; + if (!(trace_flags & TRACE_ITER_PRINTK)) + return 0; + pc = preempt_count(); if (unlikely(tracing_selftest_running || tracing_disabled)) @@ -515,6 +518,9 @@ int __trace_bputs(unsigned long ip, const char *str) int size = sizeof(struct bputs_entry); int pc; + if (!(trace_flags & TRACE_ITER_PRINTK)) + return 0; + pc = preempt_count(); if (unlikely(tracing_selftest_running || tracing_disabled)) From 9aec8629ec829fc9403788cd959e05dd87988bd1 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Mon, 14 Jul 2014 16:35:54 -0400 Subject: [PATCH 222/455] dm thin metadata: do not allow the data block size to change The block size for the thin-pool's data device must remained fixed for the life of the thin-pool. Disallow any attempt to change the thin-pool's data block size. It should be noted that attempting to change the data block size via thin-pool table reload will be ignored as a side-effect of the thin-pool handover that the thin-pool target does during thin-pool table reload. Here is an example outcome of attempting to load a thin-pool table that reduced the thin-pool's data block size from 1024K to 512K. Before: kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks After: kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object kernel: device-mapper: ioctl: error adding target to table Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Cc: stable@vger.kernel.org --- drivers/md/dm-thin-metadata.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c index b086a945edcb..e9d33ad59df5 100644 --- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -613,6 +613,15 @@ static int __open_metadata(struct dm_pool_metadata *pmd) disk_super = dm_block_data(sblock); + /* Verify the data block size hasn't changed */ + if (le32_to_cpu(disk_super->data_block_size) != pmd->data_block_size) { + DMERR("changing the data block size (from %u to %llu) is not supported", + le32_to_cpu(disk_super->data_block_size), + (unsigned long long)pmd->data_block_size); + r = -EINVAL; + goto bad_unlock_sblock; + } + r = __check_incompat_features(disk_super, pmd); if (r < 0) goto bad_unlock_sblock; From 048e5a07f282c57815b3901d4a68a77fa131ce0a Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Mon, 14 Jul 2014 16:59:39 -0400 Subject: [PATCH 223/455] dm cache metadata: do not allow the data block size to change The block size for the dm-cache's data device must remained fixed for the life of the cache. Disallow any attempt to change the cache's data block size. Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Cc: stable@vger.kernel.org --- drivers/md/dm-cache-metadata.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 4ead4ba60656..d2899e7eb3aa 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -425,6 +425,15 @@ static int __open_metadata(struct dm_cache_metadata *cmd) disk_super = dm_block_data(sblock); + /* Verify the data block size hasn't changed */ + if (le32_to_cpu(disk_super->data_block_size) != cmd->data_block_size) { + DMERR("changing the data block size (from %u to %llu) is not supported", + le32_to_cpu(disk_super->data_block_size), + (unsigned long long)cmd->data_block_size); + r = -EINVAL; + goto bad; + } + r = __check_incompat_features(disk_super, cmd); if (r < 0) goto bad; From 7a2deccf0ef12f7f6e33150d5875020c0c94fa94 Mon Sep 17 00:00:00 2001 From: Maxime COQUELIN Date: Fri, 20 Jun 2014 13:34:54 +0200 Subject: [PATCH 224/455] pinctrl: st: Fix irqmux handler st_gpio_irqmux_handler() reads the status register to find out which banks inside the controller have pending IRQs. For each banks having pending IRQs, it calls the corresponding handler. Problem is that current code restricts the number of possible banks inside the controller to ST_GPIO_PINS_PER_BANK. This define represents the number of pins inside a bank, so it shouldn't be used here. On STiH407, PIO_FRONT0 controller has 10 banks, so IRQs pending in the two last banks (PIO18 & PIO19) aren't handled. This patch replace ST_GPIO_PINS_PER_BANK by the number of banks inside the controller. Cc: Linus Walleij Cc: #v3.15+ Acked-by: Srinivas Kandagatla Signed-off-by: Maxime Coquelin Signed-off-by: Linus Walleij --- drivers/pinctrl/pinctrl-st.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinctrl-st.c b/drivers/pinctrl/pinctrl-st.c index 1bd6363bc95e..9f43916637ca 100644 --- a/drivers/pinctrl/pinctrl-st.c +++ b/drivers/pinctrl/pinctrl-st.c @@ -1431,7 +1431,7 @@ static void st_gpio_irqmux_handler(unsigned irq, struct irq_desc *desc) status = readl(info->irqmux_base); - for_each_set_bit(n, &status, ST_GPIO_PINS_PER_BANK) + for_each_set_bit(n, &status, info->nbanks) __gpio_irq_handler(&info->banks[n]); chained_irq_exit(chip, desc); From 9963b53693c5fd20405ea8feb07c5a8626380d52 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Wed, 2 Jul 2014 16:33:31 +0200 Subject: [PATCH 225/455] MAINTAINERS: Add entry for the Renesas pin controller driver I'm actively maintaining the driver, let's document that. Signed-off-by: Laurent Pinchart Signed-off-by: Linus Walleij --- MAINTAINERS | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index e31c87474739..b6ffdda2c9e2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6958,6 +6958,12 @@ L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained F: drivers/pinctrl/pinctrl-at91.c +PIN CONTROLLER - RENESAS +M: Laurent Pinchart +L: linux-sh@vger.kernel.org +S: Maintained +F: drivers/pinctrl/sh-pfc/ + PIN CONTROLLER - SAMSUNG M: Tomasz Figa M: Thomas Abraham From fe132649b5b28c19bc657d167c232180774739f8 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Tue, 8 Jul 2014 12:46:46 +0200 Subject: [PATCH 226/455] gpio: rcar: Add support for DT IRQ flags The gpio-rcar driver has no IRQ domain OF xlate function and thus ignores IRQ flags specified in DT. Fix this by using the two-cell xlate function. Signed-off-by: Laurent Pinchart Signed-off-by: Linus Walleij --- drivers/gpio/gpio-rcar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c index 0c9f803fc1ac..b6ae89ea8811 100644 --- a/drivers/gpio/gpio-rcar.c +++ b/drivers/gpio/gpio-rcar.c @@ -284,6 +284,7 @@ static int gpio_rcar_irq_domain_map(struct irq_domain *h, unsigned int irq, static struct irq_domain_ops gpio_rcar_irq_domain_ops = { .map = gpio_rcar_irq_domain_map, + .xlate = irq_domain_xlate_twocell, }; struct gpio_rcar_info { From d68aab6b8f572406aa93b45ef6483934dd3b54a6 Mon Sep 17 00:00:00 2001 From: Niu Yawei Date: Wed, 4 Jun 2014 12:22:13 +0800 Subject: [PATCH 227/455] quota: missing lock in dqcache_shrink_scan() Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API) accidentally removed locking from quota shrinker. Fix it - dqcache_shrink_scan() should use dq_list_lock to protect the scan on free_dquots list. CC: stable@vger.kernel.org Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de Signed-off-by: Niu Yawei Signed-off-by: Jan Kara --- fs/quota/dquot.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 9cd5f63715c0..7f30bdc57d13 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -702,6 +702,7 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) struct dquot *dquot; unsigned long freed = 0; + spin_lock(&dq_list_lock); head = free_dquots.prev; while (head != &free_dquots && sc->nr_to_scan) { dquot = list_entry(head, struct dquot, dq_free); @@ -713,6 +714,7 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) freed++; head = free_dquots.prev; } + spin_unlock(&dq_list_lock); return freed; } From 97b8ee845393701edc06e27ccec2876ff9596019 Mon Sep 17 00:00:00 2001 From: Martin Lau Date: Mon, 9 Jun 2014 23:06:42 -0700 Subject: [PATCH 228/455] ring-buffer: Fix polling on trace_pipe ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com Cc: stable@vger.kernel.org # 2.6.27 Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled" Reviewed-by: Chris Mason Signed-off-by: Martin Lau Signed-off-by: Steven Rostedt --- kernel/trace/ring_buffer.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 7c56c3d06943..ff7027199a9a 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -616,10 +616,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, struct ring_buffer_per_cpu *cpu_buffer; struct rb_irq_work *work; - if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || - (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) - return POLLIN | POLLRDNORM; - if (cpu == RING_BUFFER_ALL_CPUS) work = &buffer->irq_work; else { From 2627b7e15c5064ddd5e578e4efd948d48d531a3f Mon Sep 17 00:00:00 2001 From: Julian Anastasov Date: Thu, 10 Jul 2014 09:24:01 +0300 Subject: [PATCH 229/455] ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack commit 8f4e0a18682d91 ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by: Julian Anastasov Signed-off-by: Hans Schillstrom Signed-off-by: Simon Horman --- net/netfilter/ipvs/ip_vs_conn.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index a8eb0a89326a..610e19c0e13f 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -797,7 +797,6 @@ static void ip_vs_conn_expire(unsigned long data) ip_vs_control_del(cp); if (cp->flags & IP_VS_CONN_F_NFCT) { - ip_vs_conn_drop_conntrack(cp); /* Do not access conntracks during subsys cleanup * because nf_conntrack_find_get can not be used after * conntrack cleanup for the net. From 44305ebde243a7cce2c592cc89afe5041d8bf884 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Tue, 20 May 2014 22:35:38 -0700 Subject: [PATCH 230/455] UBI: fastmap: do not miss bit-flips The return value from 'ubi_io_read_ec_hdr()' was stored in 'err', not in 'ret'. This fix makes sure Fastmap-enabled UBI does not miss bit-flip while reading EC headers, events and scrubs the affected PEBs. This issue was reported by Coverity Scan. Artem: improved the commit message. Signed-off-by: Brian Norris Acked-by: Richard Weinberger Signed-off-by: Artem Bityutskiy --- drivers/mtd/ubi/fastmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index 72f39daee649..0431b46d9fd9 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -423,7 +423,7 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai, pnum, err); ret = err > 0 ? UBI_BAD_FASTMAP : err; goto out; - } else if (ret == UBI_IO_BITFLIPS) + } else if (err == UBI_IO_BITFLIPS) scrub = 1; /* From 4a36b44c77515ca1ad799577d3f9e2fa4d68bffa Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Wed, 18 Jun 2014 12:32:19 +0200 Subject: [PATCH 231/455] s390: require mvcos facility, not tod clock steering facility Inlined uaccess functions require the mvcos facility (bit 27), not the tod clock steering facility (bit 28) for z10 and newer machines. Signed-off-by: David Hildenbrand Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/head.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/s390/kernel/head.S b/arch/s390/kernel/head.S index 7ba7d6784510..e88d35d74950 100644 --- a/arch/s390/kernel/head.S +++ b/arch/s390/kernel/head.S @@ -437,11 +437,11 @@ ENTRY(startup_kdump) #if defined(CONFIG_64BIT) #if defined(CONFIG_MARCH_ZEC12) - .long 3, 0xc100efea, 0xf46ce800, 0x00400000 + .long 3, 0xc100eff2, 0xf46ce800, 0x00400000 #elif defined(CONFIG_MARCH_Z196) - .long 2, 0xc100efea, 0xf46c0000 + .long 2, 0xc100eff2, 0xf46c0000 #elif defined(CONFIG_MARCH_Z10) - .long 2, 0xc100efea, 0xf0680000 + .long 2, 0xc100eff2, 0xf0680000 #elif defined(CONFIG_MARCH_Z9_109) .long 1, 0xc100efc2 #elif defined(CONFIG_MARCH_Z990) From 7cbe4afe854a9a352d9562106449bc55d17d5e5b Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Thu, 26 Jun 2014 10:48:56 +0200 Subject: [PATCH 232/455] s390/3270: correct size detection with the read-partition command The size detection for 3270 terminals with the read-partition command is broken. The raw3270_reset_device_cb function clears the init_data array, but if raw3270_writesf_readpart has been called the read-partition command is queued which needs the init_data array. In this case the size detection will fail and the invalid command does funny things to the terminal. Signed-off-by: Martin Schwidefsky --- drivers/s390/char/raw3270.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/s390/char/raw3270.c b/drivers/s390/char/raw3270.c index 15b3459f8656..220acb4cbee5 100644 --- a/drivers/s390/char/raw3270.c +++ b/drivers/s390/char/raw3270.c @@ -633,7 +633,6 @@ raw3270_reset_device_cb(struct raw3270_request *rq, void *data) } else raw3270_writesf_readpart(rp); memset(&rp->init_reset, 0, sizeof(rp->init_reset)); - memset(&rp->init_data, 0, sizeof(rp->init_data)); } static int From 8fb878c5f12bf7fd6099d466139bd4564418e583 Mon Sep 17 00:00:00 2001 From: Yijing Wang Date: Tue, 8 Jul 2014 10:08:05 +0800 Subject: [PATCH 233/455] s390/MSI: Use standard mask and unmask funtions MSI irqchip in s390 has its own mask and unmask MSI irq functions, zpci_enable_irq() and zpci_disable_irq(). They mask and unmask MSI irq in standard ways, no arch special. MSI driver provides two global standard functions mask_msi_irq() and unmask_msi_irq(). Local zpci_enable_irq() and zpci_disable_irq() are almost the same as the standard two. the difference is local mask/unmask functions read the mask status before mask and unmask everytime. Then change the value and rewrite to hardware. In standard functions, save the mask status after mask and unmask msi irq, and use the cached status to change the mask status. When we mask or unmask a MSI irq, we always cache its mask status except we know need not to cache it, like in pci_msi_shutdown. So use the standard functions to replace the local is safe. Signed-off-by: Yijing Wang [sebott: fixed inverted function pointers] Signed-off-by: Sebastian Ott Signed-off-by: Martin Schwidefsky --- arch/s390/pci/pci.c | 49 ++++++--------------------------------------- 1 file changed, 6 insertions(+), 43 deletions(-) diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 9ddc51eeb8d6..30de42730b2f 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -48,13 +48,10 @@ static LIST_HEAD(zpci_list); static DEFINE_SPINLOCK(zpci_list_lock); -static void zpci_enable_irq(struct irq_data *data); -static void zpci_disable_irq(struct irq_data *data); - static struct irq_chip zpci_irq_chip = { .name = "zPCI", - .irq_unmask = zpci_enable_irq, - .irq_mask = zpci_disable_irq, + .irq_unmask = unmask_msi_irq, + .irq_mask = mask_msi_irq, }; static DECLARE_BITMAP(zpci_domain, ZPCI_NR_DEVICES); @@ -244,43 +241,6 @@ static int zpci_cfg_store(struct zpci_dev *zdev, int offset, u32 val, u8 len) return rc; } -static int zpci_msi_set_mask_bits(struct msi_desc *msi, u32 mask, u32 flag) -{ - int offset, pos; - u32 mask_bits; - - if (msi->msi_attrib.is_msix) { - offset = msi->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + - PCI_MSIX_ENTRY_VECTOR_CTRL; - msi->masked = readl(msi->mask_base + offset); - writel(flag, msi->mask_base + offset); - } else if (msi->msi_attrib.maskbit) { - pos = (long) msi->mask_base; - pci_read_config_dword(msi->dev, pos, &mask_bits); - mask_bits &= ~(mask); - mask_bits |= flag & mask; - pci_write_config_dword(msi->dev, pos, mask_bits); - } else - return 0; - - msi->msi_attrib.maskbit = !!flag; - return 1; -} - -static void zpci_enable_irq(struct irq_data *data) -{ - struct msi_desc *msi = irq_get_msi_desc(data->irq); - - zpci_msi_set_mask_bits(msi, 1, 0); -} - -static void zpci_disable_irq(struct irq_data *data) -{ - struct msi_desc *msi = irq_get_msi_desc(data->irq); - - zpci_msi_set_mask_bits(msi, 1, 1); -} - void pcibios_fixup_bus(struct pci_bus *bus) { } @@ -487,7 +447,10 @@ void arch_teardown_msi_irqs(struct pci_dev *pdev) /* Release MSI interrupts */ list_for_each_entry(msi, &pdev->msi_list, list) { - zpci_msi_set_mask_bits(msi, 1, 1); + if (msi->msi_attrib.is_msix) + default_msix_mask_irq(msi, 1); + else + default_msi_mask_irq(msi, 1, 1); irq_set_msi_desc(msi->irq, NULL); irq_free_desc(msi->irq); msi->msg.address_lo = 0; From dab6cf55f81a6e16b8147aed9a843e1691dcd318 Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Mon, 23 Jun 2014 15:29:40 +0200 Subject: [PATCH 234/455] s390/ptrace: fix PSW mask check The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect. The PSW_MASK_USER define contains the PSW_MASK_ASC bits, the ptrace interface accepts all combinations for the address-space-control bits. To protect the kernel space the PSW mask check in ptrace needs to reject the address-space-control bit combination for home space. Fixes CVE-2014-3534 Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/ptrace.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c index 2d716734b5b1..5dc7ad9e2fbf 100644 --- a/arch/s390/kernel/ptrace.c +++ b/arch/s390/kernel/ptrace.c @@ -334,9 +334,14 @@ static int __poke_user(struct task_struct *child, addr_t addr, addr_t data) unsigned long mask = PSW_MASK_USER; mask |= is_ri_task(child) ? PSW_MASK_RI : 0; - if ((data & ~mask) != PSW_USER_BITS) + if ((data ^ PSW_USER_BITS) & ~mask) + /* Invalid psw mask. */ + return -EINVAL; + if ((data & PSW_MASK_ASC) == PSW_ASC_HOME) + /* Invalid address-space-control bits */ return -EINVAL; if ((data & PSW_MASK_EA) && !(data & PSW_MASK_BA)) + /* Invalid addressing mode bits */ return -EINVAL; } *(addr_t *)((addr_t) &task_pt_regs(child)->psw + addr) = data; @@ -672,9 +677,12 @@ static int __poke_user_compat(struct task_struct *child, mask |= is_ri_task(child) ? PSW32_MASK_RI : 0; /* Build a 64 bit psw mask from 31 bit mask. */ - if ((tmp & ~mask) != PSW32_USER_BITS) + if ((tmp ^ PSW32_USER_BITS) & ~mask) /* Invalid psw mask. */ return -EINVAL; + if ((data & PSW32_MASK_ASC) == PSW32_ASC_HOME) + /* Invalid address-space-control bits */ + return -EINVAL; regs->psw.mask = (regs->psw.mask & ~PSW_MASK_USER) | (regs->psw.mask & PSW_MASK_BA) | (__u64)(tmp & mask) << 32; From 666e68e0dde826ae146b980099f1719f74fa968c Mon Sep 17 00:00:00 2001 From: Ingo Tuchscherer Date: Mon, 14 Jul 2014 19:11:48 +0200 Subject: [PATCH 235/455] s390/zcrypt: improve device probing for zcrypt adapter cards Improve device probing process for zcrypt adapters to transmit service request during registration process. Signed-off-by: Ingo Tuchscherer Signed-off-by: Martin Schwidefsky --- drivers/s390/crypto/ap_bus.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index 69ef4f8cfac8..4038437ff033 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -901,10 +901,15 @@ static int ap_device_probe(struct device *dev) int rc; ap_dev->drv = ap_drv; + + spin_lock_bh(&ap_device_list_lock); + list_add(&ap_dev->list, &ap_device_list); + spin_unlock_bh(&ap_device_list_lock); + rc = ap_drv->probe ? ap_drv->probe(ap_dev) : -ENODEV; - if (!rc) { + if (rc) { spin_lock_bh(&ap_device_list_lock); - list_add(&ap_dev->list, &ap_device_list); + list_del_init(&ap_dev->list); spin_unlock_bh(&ap_device_list_lock); } return rc; From 9f86745722d95bc7f855069bd82285bd10dc97ff Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Tue, 15 Jul 2014 10:41:37 +0200 Subject: [PATCH 236/455] s390: fix restore of invalid floating-point-control The fixup of the inline assembly to restore the floating-point-control register needs to check for instruction address *after* the lfcp instruction as the specification and data exceptions are suppresssing. Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/switch_to.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/switch_to.h b/arch/s390/include/asm/switch_to.h index df38c70cd59e..18ea9e3f8142 100644 --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -51,8 +51,8 @@ static inline int restore_fp_ctl(u32 *fpc) return 0; asm volatile( - "0: lfpc %1\n" - " la %0,0\n" + " lfpc %1\n" + "0: la %0,0\n" "1:\n" EX_TABLE(0b,1b) : "=d" (rc) : "Q" (*fpc), "0" (-EINVAL)); From d3f44fbabe55132832e152606365adb640296378 Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Mon, 30 Jun 2014 16:32:29 +0200 Subject: [PATCH 237/455] x86: Remove unused variable "polling" Compile tested. "polling" is unused since commit f80c5b39b80a ("sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG"). Signed-off-by: Paul Bolle Signed-off-by: Peter Zijlstra Cc: Jiri Kosina Link: http://lkml.kernel.org/r/1404138749.2978.6.camel@x41 Signed-off-by: Ingo Molnar --- arch/x86/kernel/apm_32.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c index f3a1f04ed4cb..584874451414 100644 --- a/arch/x86/kernel/apm_32.c +++ b/arch/x86/kernel/apm_32.c @@ -841,7 +841,6 @@ static int apm_do_idle(void) u32 eax; u8 ret = 0; int idled = 0; - int polling; int err = 0; if (!need_resched()) { From 1903d50cba54261a6562a476c05085f3d7a54097 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 15 Jul 2014 17:27:27 +0200 Subject: [PATCH 238/455] perf: Revert ("perf: Always destroy groups on exit") Vince reported that commit 15a2d4de0eab5 ("perf: Always destroy groups on exit") causes a regression with grouped events. In particular his read_group_attached.c test fails. https://github.com/deater/perf_event_tests/blob/master/tests/bugs/read_group_attached.c Because of the context switch optimization in perf_event_context_sched_out() the 'original' event may end up in the child process and when that exits the change in the patch in question destroys the actual grouping. Therefore revert that change and only destroy inherited groups. Reported-by: Vince Weaver Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/n/tip-zedy3uktcp753q8fw8dagx7a@git.kernel.org Signed-off-by: Ingo Molnar --- kernel/events/core.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index b0c95f0f06fd..c46b02bfe179 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7458,7 +7458,19 @@ __perf_event_exit_task(struct perf_event *child_event, struct perf_event_context *child_ctx, struct task_struct *child) { - perf_remove_from_context(child_event, true); + /* + * Do not destroy the 'original' grouping; because of the context + * switch optimization the original events could've ended up in a + * random child task. + * + * If we were to destroy the original group, all group related + * operations would cease to function properly after this random + * child dies. + * + * Do destroy all inherited groups, we don't care about those + * and being thorough is better. + */ + perf_remove_from_context(child_event, !!child_event->parent); /* * It can happen that the parent exits first, and has events From 1996388e9f4e3444db8273bc08d25164d2967c21 Mon Sep 17 00:00:00 2001 From: Vince Weaver Date: Mon, 14 Jul 2014 15:33:25 -0400 Subject: [PATCH 239/455] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge This was discussed back in February: https://lkml.org/lkml/2014/2/18/956 But I never saw a patch come out of it. On IvyBridge we share the SandyBridge cache event tables, but the dTLB-load-miss event is not compatible. Patch it up after the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK Signed-off-by: Vince Weaver Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event_intel.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 07846d738bdb..c206815b9556 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2474,6 +2474,9 @@ __init int intel_pmu_init(void) case 62: /* IvyBridge EP */ memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + /* dTLB-load-misses on IVB is different than SNB */ + hw_cache_event_ids[C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = 0x8108; /* DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK */ + memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); From 7711fe4fc2606712125cff1a55ce00df2ae0f1fb Mon Sep 17 00:00:00 2001 From: Stephane Eranian Date: Mon, 30 Jun 2014 16:46:24 +0200 Subject: [PATCH 240/455] perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings This patch fixes the SNB-EP and IVT Cbox filter mapping table. The table controls which filters are supported by which events. There were several mistakes in those tables causing some filters to be ignored, such as NID on TOR_INSERTS. Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra Cc: zheng.z.yan@intel.com Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20140630144624.GA2604@quad Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event_intel_uncore.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c index 65bbbea38b9c..ae6552a0701f 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -550,16 +550,16 @@ static struct extra_reg snbep_uncore_cbox_extra_regs[] = { SNBEP_CBO_EVENT_EXTRA_REG(0x4134, 0xffff, 0x6), SNBEP_CBO_EVENT_EXTRA_REG(0x0135, 0xffff, 0x8), SNBEP_CBO_EVENT_EXTRA_REG(0x0335, 0xffff, 0x8), - SNBEP_CBO_EVENT_EXTRA_REG(0x4135, 0xffff, 0xc), - SNBEP_CBO_EVENT_EXTRA_REG(0x4335, 0xffff, 0xc), + SNBEP_CBO_EVENT_EXTRA_REG(0x4135, 0xffff, 0xa), + SNBEP_CBO_EVENT_EXTRA_REG(0x4335, 0xffff, 0xa), SNBEP_CBO_EVENT_EXTRA_REG(0x4435, 0xffff, 0x2), SNBEP_CBO_EVENT_EXTRA_REG(0x4835, 0xffff, 0x2), SNBEP_CBO_EVENT_EXTRA_REG(0x4a35, 0xffff, 0x2), SNBEP_CBO_EVENT_EXTRA_REG(0x5035, 0xffff, 0x2), SNBEP_CBO_EVENT_EXTRA_REG(0x0136, 0xffff, 0x8), SNBEP_CBO_EVENT_EXTRA_REG(0x0336, 0xffff, 0x8), - SNBEP_CBO_EVENT_EXTRA_REG(0x4136, 0xffff, 0xc), - SNBEP_CBO_EVENT_EXTRA_REG(0x4336, 0xffff, 0xc), + SNBEP_CBO_EVENT_EXTRA_REG(0x4136, 0xffff, 0xa), + SNBEP_CBO_EVENT_EXTRA_REG(0x4336, 0xffff, 0xa), SNBEP_CBO_EVENT_EXTRA_REG(0x4436, 0xffff, 0x2), SNBEP_CBO_EVENT_EXTRA_REG(0x4836, 0xffff, 0x2), SNBEP_CBO_EVENT_EXTRA_REG(0x4a36, 0xffff, 0x2), @@ -1222,6 +1222,7 @@ static struct extra_reg ivt_uncore_cbox_extra_regs[] = { SNBEP_CBO_EVENT_EXTRA_REG(SNBEP_CBO_PMON_CTL_TID_EN, SNBEP_CBO_PMON_CTL_TID_EN, 0x1), SNBEP_CBO_EVENT_EXTRA_REG(0x1031, 0x10ff, 0x2), + SNBEP_CBO_EVENT_EXTRA_REG(0x1134, 0xffff, 0x4), SNBEP_CBO_EVENT_EXTRA_REG(0x4134, 0xffff, 0xc), SNBEP_CBO_EVENT_EXTRA_REG(0x5134, 0xffff, 0xc), @@ -1245,7 +1246,7 @@ static struct extra_reg ivt_uncore_cbox_extra_regs[] = { SNBEP_CBO_EVENT_EXTRA_REG(0x8335, 0xffff, 0x10), SNBEP_CBO_EVENT_EXTRA_REG(0x0136, 0xffff, 0x10), SNBEP_CBO_EVENT_EXTRA_REG(0x0336, 0xffff, 0x10), - SNBEP_CBO_EVENT_EXTRA_REG(0x2336, 0xffff, 0x10), + SNBEP_CBO_EVENT_EXTRA_REG(0x2136, 0xffff, 0x10), SNBEP_CBO_EVENT_EXTRA_REG(0x2336, 0xffff, 0x10), SNBEP_CBO_EVENT_EXTRA_REG(0x4136, 0xffff, 0x18), SNBEP_CBO_EVENT_EXTRA_REG(0x4336, 0xffff, 0x18), From 4a1c0f262f88e2676fda80a6bf80e7dbccae1dcb Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 23 Jun 2014 16:12:42 +0200 Subject: [PATCH 241/455] perf: Fix lockdep warning on process exit Sasha Levin reported: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel I've stumbled on the following spew: > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.15.0-next-20140613-sasha-00026-g6dd125d-dirty #654 Not tainted > ------------------------------------------------------- > trinity-c578/9725 is trying to acquire lock: > (&(&pool->lock)->rlock){-.-...}, at: __queue_work (kernel/workqueue.c:1346) > > but task is already holding lock: > (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533) > > which lock already depends on the new lock. > 1 lock held by trinity-c578/9725: > #0: (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533) > > Call Trace: > dump_stack (lib/dump_stack.c:52) > print_circular_bug (kernel/locking/lockdep.c:1216) > __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) > lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) > _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) > __queue_work (kernel/workqueue.c:1346) > queue_work_on (kernel/workqueue.c:1424) > free_object (lib/debugobjects.c:209) > __debug_check_no_obj_freed (lib/debugobjects.c:715) > debug_check_no_obj_freed (lib/debugobjects.c:727) > kmem_cache_free (mm/slub.c:2683 mm/slub.c:2711) > free_task (kernel/fork.c:221) > __put_task_struct (kernel/fork.c:250) > put_ctx (include/linux/sched.h:1855 kernel/events/core.c:898) > perf_event_exit_task (kernel/events/core.c:907 kernel/events/core.c:7478 kernel/events/core.c:7533) > do_exit (kernel/exit.c:766) > do_group_exit (kernel/exit.c:884) > get_signal_to_deliver (kernel/signal.c:2347) > do_signal (arch/x86/kernel/signal.c:698) > do_notify_resume (arch/x86/kernel/signal.c:751) > int_signal (arch/x86/kernel/entry_64.S:600) Urgh.. so the only way I can make that happen is through: perf_event_exit_task_context() raw_spin_lock(&child_ctx->lock); unclone_ctx(child_ctx) put_ctx(ctx->parent_ctx); raw_spin_unlock_irqrestore(&child_ctx->lock); And we can avoid this by doing the change below. I can't immediately see how this changed recently, but given that you say it's easy to reproduce, lets fix this. Reported-by: Sasha Levin Signed-off-by: Peter Zijlstra Cc: Tejun Heo Cc: Dave Jones Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20140623141242.GB19860@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- kernel/events/core.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index c46b02bfe179..6b17ac1b0c2a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7486,7 +7486,7 @@ __perf_event_exit_task(struct perf_event *child_event, static void perf_event_exit_task_context(struct task_struct *child, int ctxn) { struct perf_event *child_event, *next; - struct perf_event_context *child_ctx; + struct perf_event_context *child_ctx, *parent_ctx; unsigned long flags; if (likely(!child->perf_event_ctxp[ctxn])) { @@ -7511,6 +7511,15 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn) raw_spin_lock(&child_ctx->lock); task_ctx_sched_out(child_ctx); child->perf_event_ctxp[ctxn] = NULL; + + /* + * In order to avoid freeing: child_ctx->parent_ctx->task + * under perf_event_context::lock, grab another reference. + */ + parent_ctx = child_ctx->parent_ctx; + if (parent_ctx) + get_ctx(parent_ctx); + /* * If this context is a clone; unclone it so it can't get * swapped to another process while we're removing all @@ -7520,6 +7529,13 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn) update_context_time(child_ctx); raw_spin_unlock_irqrestore(&child_ctx->lock, flags); + /* + * Now that we no longer hold perf_event_context::lock, drop + * our extra child_ctx->parent_ctx reference. + */ + if (parent_ctx) + put_ctx(parent_ctx); + /* * Report the task dead after unscheduling the events so that we * won't get any samples after PERF_RECORD_EXIT. We can however still From 338b522ca43cfd32d11a370f4203bcd089c6c877 Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Mon, 14 Jul 2014 12:25:56 -0700 Subject: [PATCH 242/455] perf/x86/intel: Protect LBR and extra_regs against KVM lying With -cpu host, KVM reports LBR and extra_regs support, if the host has support. When the guest perf driver tries to access LBR or extra_regs MSR, it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support. So check the related MSRs access right once at initialization time to avoid the error access at runtime. For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel). And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel). Start the guest with -cpu host. Run perf record with --branch-any or --branch-filter in guest to trigger LBR Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to trigger offcore_rsp #GP Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra Cc: Andi Kleen Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: Maria Dimakopoulou Cc: Mark Davies Cc: Paul Mackerras Cc: Stephane Eranian Cc: Yan, Zheng Link: http://lkml.kernel.org/r/1405365957-20202-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event.c | 3 ++ arch/x86/kernel/cpu/perf_event.h | 12 +++-- arch/x86/kernel/cpu/perf_event_intel.c | 66 +++++++++++++++++++++++++- 3 files changed, 75 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 2bdfbff8a4f6..2879ecdaac43 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -118,6 +118,9 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event *event) continue; if (event->attr.config1 & ~er->valid_mask) return -EINVAL; + /* Check if the extra msrs can be safely accessed*/ + if (!er->extra_msr_access) + return -ENXIO; reg->idx = er->idx; reg->config = event->attr.config1; diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index 3b2f9bdd974b..8ade93111e03 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -295,14 +295,16 @@ struct extra_reg { u64 config_mask; u64 valid_mask; int idx; /* per_xxx->regs[] reg index */ + bool extra_msr_access; }; #define EVENT_EXTRA_REG(e, ms, m, vm, i) { \ - .event = (e), \ - .msr = (ms), \ - .config_mask = (m), \ - .valid_mask = (vm), \ - .idx = EXTRA_REG_##i, \ + .event = (e), \ + .msr = (ms), \ + .config_mask = (m), \ + .valid_mask = (vm), \ + .idx = EXTRA_REG_##i, \ + .extra_msr_access = true, \ } #define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx) \ diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index c206815b9556..2502d0d9d246 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2182,6 +2182,41 @@ static void intel_snb_check_microcode(void) } } +/* + * Under certain circumstances, access certain MSR may cause #GP. + * The function tests if the input MSR can be safely accessed. + */ +static bool check_msr(unsigned long msr, u64 mask) +{ + u64 val_old, val_new, val_tmp; + + /* + * Read the current value, change it and read it back to see if it + * matches, this is needed to detect certain hardware emulators + * (qemu/kvm) that don't trap on the MSR access and always return 0s. + */ + if (rdmsrl_safe(msr, &val_old)) + return false; + + /* + * Only change the bits which can be updated by wrmsrl. + */ + val_tmp = val_old ^ mask; + if (wrmsrl_safe(msr, val_tmp) || + rdmsrl_safe(msr, &val_new)) + return false; + + if (val_new != val_tmp) + return false; + + /* Here it's sure that the MSR can be safely accessed. + * Restore the old value and return. + */ + wrmsrl(msr, val_old); + + return true; +} + static __init void intel_sandybridge_quirk(void) { x86_pmu.check_microcode = intel_snb_check_microcode; @@ -2271,7 +2306,8 @@ __init int intel_pmu_init(void) union cpuid10_ebx ebx; struct event_constraint *c; unsigned int unused; - int version; + struct extra_reg *er; + int version, i; if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) { switch (boot_cpu_data.x86) { @@ -2577,6 +2613,34 @@ __init int intel_pmu_init(void) } } + /* + * Access LBR MSR may cause #GP under certain circumstances. + * E.g. KVM doesn't support LBR MSR + * Check all LBT MSR here. + * Disable LBR access if any LBR MSRs can not be accessed. + */ + if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL)) + x86_pmu.lbr_nr = 0; + for (i = 0; i < x86_pmu.lbr_nr; i++) { + if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) && + check_msr(x86_pmu.lbr_to + i, 0xffffUL))) + x86_pmu.lbr_nr = 0; + } + + /* + * Access extra MSR may cause #GP under certain circumstances. + * E.g. KVM doesn't support offcore event + * Check all extra_regs here. + */ + if (x86_pmu.extra_regs) { + for (er = x86_pmu.extra_regs; er->msr; er++) { + er->extra_msr_access = check_msr(er->msr, 0x1ffUL); + /* Disable LBR select mapping */ + if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access) + x86_pmu.lbr_sel_map = NULL; + } + } + /* Support full width counters using alternative MSR range */ if (x86_pmu.intel_cap.full_width_write) { x86_pmu.max_period = x86_pmu.cntval_mask; From 37e9562453b813d2ea527bd9531fef2c3c592847 Mon Sep 17 00:00:00 2001 From: Jason Low Date: Fri, 4 Jul 2014 20:49:32 -0700 Subject: [PATCH 243/455] locking/rwsem: Allow conservative optimistic spinning when readers have lock Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning") introduced a major performance regression for workloads such as xfs_repair which mix read and write locking of the mmap_sem across many threads. The result was xfs_repair ran 5x slower on 3.16-rc2 than on 3.15 and using 20x more system CPU time. Perf profiles indicate in some workloads that significant time can be spent spinning on !owner. This is because we don't set the lock owner when readers(s) obtain the rwsem. In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll return false if there is no lock owner. The rationale is that if we just entered the slowpath, yet there is no lock owner, then there is a possibility that a reader has the lock. To be conservative, we'll avoid spinning in these situations. This patch reduced the total run time of the xfs_repair workload from about 4 minutes 24 seconds down to approximately 1 minute 26 seconds, back to close to the same performance as on 3.15. Retesting of AIM7, which were some of the workloads used to test the original optimistic spinning code, confirmed that we still get big performance gains with optimistic spinning, even with this additional regression fix. Davidlohr found that while the 'custom' workload took a performance hit of ~-14% to throughput for >300 users with this additional patch, the overall gain with optimistic spinning is still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users. Tested-by: Dave Chinner Acked-by: Davidlohr Bueso Signed-off-by: Jason Low Signed-off-by: Peter Zijlstra Cc: Tim Chen Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox Signed-off-by: Ingo Molnar --- kernel/locking/rwsem-xadd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index dacc32142fcc..c40c7d28661d 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -285,10 +285,10 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) { struct task_struct *owner; - bool on_cpu = true; + bool on_cpu = false; if (need_resched()) - return 0; + return false; rcu_read_lock(); owner = ACCESS_ONCE(sem->owner); @@ -297,9 +297,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) rcu_read_unlock(); /* - * If sem->owner is not set, the rwsem owner may have - * just acquired it and not set the owner yet or the rwsem - * has been released. + * If sem->owner is not set, yet we have just recently entered the + * slowpath, then there is a possibility reader(s) may have the lock. + * To be safe, avoid spinning in these situations. */ return on_cpu; } From 046a619d8e9746fa4c0e29e8c6b78e16efc008a8 Mon Sep 17 00:00:00 2001 From: Jason Low Date: Mon, 14 Jul 2014 10:27:48 -0700 Subject: [PATCH 244/455] locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node() Currently, the per-cpu nodes structure for the cancellable MCS spinlock is named "optimistic_spin_queue". However, in a follow up patch in the series we will be introducing a new structure that serves as the new "handle" for the lock. It would make more sense if that structure is named "optimistic_spin_queue". Additionally, since the current use of the "optimistic_spin_queue" structure are "nodes", it might be better if we rename them to "node" anyway. This preparatory patch renames all current "optimistic_spin_queue" to "optimistic_spin_node". Signed-off-by: Jason Low Signed-off-by: Peter Zijlstra Cc: Scott Norton Cc: "Paul E. McKenney" Cc: Dave Chinner Cc: Waiman Long Cc: Davidlohr Bueso Cc: Rik van Riel Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Steven Rostedt Cc: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Aswin Chandramouleeswaran Cc: Linus Torvalds Cc: Chris Mason Cc: Heiko Carstens Cc: Josef Bacik Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar --- include/linux/mutex.h | 4 ++-- include/linux/rwsem.h | 4 ++-- kernel/locking/mcs_spinlock.c | 24 ++++++++++++------------ kernel/locking/mcs_spinlock.h | 8 ++++---- 4 files changed, 20 insertions(+), 20 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 11692dea18aa..885f3f56a77f 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -46,7 +46,7 @@ * - detects multi-task circular deadlocks and prints out all affected * locks and tasks (and only those tasks) */ -struct optimistic_spin_queue; +struct optimistic_spin_node; struct mutex { /* 1: unlocked, 0: locked, negative: locked, possible waiters */ atomic_t count; @@ -56,7 +56,7 @@ struct mutex { struct task_struct *owner; #endif #ifdef CONFIG_MUTEX_SPIN_ON_OWNER - struct optimistic_spin_queue *osq; /* Spinner MCS lock */ + struct optimistic_spin_node *osq; /* Spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_MUTEXES const char *name; diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 8d79708146aa..ba3f108ddea1 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -16,7 +16,7 @@ #include -struct optimistic_spin_queue; +struct optimistic_spin_node; struct rw_semaphore; #ifdef CONFIG_RWSEM_GENERIC_SPINLOCK @@ -33,7 +33,7 @@ struct rw_semaphore { * if the owner is running on the cpu. */ struct task_struct *owner; - struct optimistic_spin_queue *osq; /* spinner MCS lock */ + struct optimistic_spin_node *osq; /* spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c index 838dc9e00669..e9866f70e828 100644 --- a/kernel/locking/mcs_spinlock.c +++ b/kernel/locking/mcs_spinlock.c @@ -14,18 +14,18 @@ * called from interrupt context and we have preemption disabled while * spinning. */ -static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_queue, osq_node); +static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node); /* * Get a stable @node->next pointer, either for unlock() or unqueue() purposes. * Can return NULL in case we were the last queued and we updated @lock instead. */ -static inline struct optimistic_spin_queue * -osq_wait_next(struct optimistic_spin_queue **lock, - struct optimistic_spin_queue *node, - struct optimistic_spin_queue *prev) +static inline struct optimistic_spin_node * +osq_wait_next(struct optimistic_spin_node **lock, + struct optimistic_spin_node *node, + struct optimistic_spin_node *prev) { - struct optimistic_spin_queue *next = NULL; + struct optimistic_spin_node *next = NULL; for (;;) { if (*lock == node && cmpxchg(lock, node, prev) == node) { @@ -59,10 +59,10 @@ osq_wait_next(struct optimistic_spin_queue **lock, return next; } -bool osq_lock(struct optimistic_spin_queue **lock) +bool osq_lock(struct optimistic_spin_node **lock) { - struct optimistic_spin_queue *node = this_cpu_ptr(&osq_node); - struct optimistic_spin_queue *prev, *next; + struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); + struct optimistic_spin_node *prev, *next; node->locked = 0; node->next = NULL; @@ -149,10 +149,10 @@ unqueue: return false; } -void osq_unlock(struct optimistic_spin_queue **lock) +void osq_unlock(struct optimistic_spin_node **lock) { - struct optimistic_spin_queue *node = this_cpu_ptr(&osq_node); - struct optimistic_spin_queue *next; + struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); + struct optimistic_spin_node *next; /* * Fast path for the uncontended case. diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index a2dbac4aca6b..c99dc0052f49 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -118,12 +118,12 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) * mutex_lock()/rwsem_down_{read,write}() etc. */ -struct optimistic_spin_queue { - struct optimistic_spin_queue *next, *prev; +struct optimistic_spin_node { + struct optimistic_spin_node *next, *prev; int locked; /* 1 if lock acquired */ }; -extern bool osq_lock(struct optimistic_spin_queue **lock); -extern void osq_unlock(struct optimistic_spin_queue **lock); +extern bool osq_lock(struct optimistic_spin_node **lock); +extern void osq_unlock(struct optimistic_spin_node **lock); #endif /* __LINUX_MCS_SPINLOCK_H */ From 90631822c5d307b5410500806e8ac3e63928aa3e Mon Sep 17 00:00:00 2001 From: Jason Low Date: Mon, 14 Jul 2014 10:27:49 -0700 Subject: [PATCH 245/455] locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Signed-off-by: Jason Low Signed-off-by: Peter Zijlstra Cc: Scott Norton Cc: "Paul E. McKenney" Cc: Dave Chinner Cc: Waiman Long Cc: Davidlohr Bueso Cc: Rik van Riel Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Steven Rostedt Cc: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Aswin Chandramouleeswaran Cc: Linus Torvalds Cc: Chris Mason Cc: Heiko Carstens Cc: Josef Bacik Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar --- include/linux/mutex.h | 4 +-- include/linux/osq_lock.h | 19 +++++++++++++++ include/linux/rwsem.h | 7 +++--- kernel/locking/mcs_spinlock.c | 46 +++++++++++++++++++++++++++++------ kernel/locking/mcs_spinlock.h | 5 ++-- kernel/locking/mutex.c | 2 +- kernel/locking/rwsem-xadd.c | 2 +- 7 files changed, 68 insertions(+), 17 deletions(-) create mode 100644 include/linux/osq_lock.h diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 885f3f56a77f..42aa9b9ecd5f 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -17,6 +17,7 @@ #include #include #include +#include /* * Simple, straightforward mutexes with strict semantics: @@ -46,7 +47,6 @@ * - detects multi-task circular deadlocks and prints out all affected * locks and tasks (and only those tasks) */ -struct optimistic_spin_node; struct mutex { /* 1: unlocked, 0: locked, negative: locked, possible waiters */ atomic_t count; @@ -56,7 +56,7 @@ struct mutex { struct task_struct *owner; #endif #ifdef CONFIG_MUTEX_SPIN_ON_OWNER - struct optimistic_spin_node *osq; /* Spinner MCS lock */ + struct optimistic_spin_queue osq; /* Spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_MUTEXES const char *name; diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h new file mode 100644 index 000000000000..b001682bf7cb --- /dev/null +++ b/include/linux/osq_lock.h @@ -0,0 +1,19 @@ +#ifndef __LINUX_OSQ_LOCK_H +#define __LINUX_OSQ_LOCK_H + +/* + * An MCS like lock especially tailored for optimistic spinning for sleeping + * lock implementations (mutex, rwsem, etc). + */ + +#define OSQ_UNLOCKED_VAL (0) + +struct optimistic_spin_queue { + /* + * Stores an encoded value of the CPU # of the tail node in the queue. + * If the queue is empty, then it's set to OSQ_UNLOCKED_VAL. + */ + atomic_t tail; +}; + +#endif diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index ba3f108ddea1..9fdcdd03507d 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -13,10 +13,9 @@ #include #include #include - #include +#include -struct optimistic_spin_node; struct rw_semaphore; #ifdef CONFIG_RWSEM_GENERIC_SPINLOCK @@ -33,7 +32,7 @@ struct rw_semaphore { * if the owner is running on the cpu. */ struct task_struct *owner; - struct optimistic_spin_node *osq; /* spinner MCS lock */ + struct optimistic_spin_queue osq; /* spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; @@ -70,7 +69,7 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \ LIST_HEAD_INIT((name).wait_list), \ NULL, /* owner */ \ - NULL /* mcs lock */ \ + { ATOMIC_INIT(OSQ_UNLOCKED_VAL) } /* osq */ \ __RWSEM_DEP_MAP_INIT(name) } #else #define __RWSEM_INITIALIZER(name) \ diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c index e9866f70e828..32fc16c0a545 100644 --- a/kernel/locking/mcs_spinlock.c +++ b/kernel/locking/mcs_spinlock.c @@ -16,19 +16,45 @@ */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node); +/* + * We use the value 0 to represent "no CPU", thus the encoded value + * will be the CPU number incremented by 1. + */ +static inline int encode_cpu(int cpu_nr) +{ + return cpu_nr + 1; +} + +static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val) +{ + int cpu_nr = encoded_cpu_val - 1; + + return per_cpu_ptr(&osq_node, cpu_nr); +} + /* * Get a stable @node->next pointer, either for unlock() or unqueue() purposes. * Can return NULL in case we were the last queued and we updated @lock instead. */ static inline struct optimistic_spin_node * -osq_wait_next(struct optimistic_spin_node **lock, +osq_wait_next(struct optimistic_spin_queue *lock, struct optimistic_spin_node *node, struct optimistic_spin_node *prev) { struct optimistic_spin_node *next = NULL; + int curr = encode_cpu(smp_processor_id()); + int old; + + /* + * If there is a prev node in queue, then the 'old' value will be + * the prev node's CPU #, else it's set to OSQ_UNLOCKED_VAL since if + * we're currently last in queue, then the queue will then become empty. + */ + old = prev ? prev->cpu : OSQ_UNLOCKED_VAL; for (;;) { - if (*lock == node && cmpxchg(lock, node, prev) == node) { + if (atomic_read(&lock->tail) == curr && + atomic_cmpxchg(&lock->tail, curr, old) == curr) { /* * We were the last queued, we moved @lock back. @prev * will now observe @lock and will complete its @@ -59,18 +85,23 @@ osq_wait_next(struct optimistic_spin_node **lock, return next; } -bool osq_lock(struct optimistic_spin_node **lock) +bool osq_lock(struct optimistic_spin_queue *lock) { struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); struct optimistic_spin_node *prev, *next; + int curr = encode_cpu(smp_processor_id()); + int old; node->locked = 0; node->next = NULL; + node->cpu = curr; - node->prev = prev = xchg(lock, node); - if (likely(prev == NULL)) + old = atomic_xchg(&lock->tail, curr); + if (old == OSQ_UNLOCKED_VAL) return true; + prev = decode_cpu(old); + node->prev = prev; ACCESS_ONCE(prev->next) = node; /* @@ -149,15 +180,16 @@ unqueue: return false; } -void osq_unlock(struct optimistic_spin_node **lock) +void osq_unlock(struct optimistic_spin_queue *lock) { struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); struct optimistic_spin_node *next; + int curr = encode_cpu(smp_processor_id()); /* * Fast path for the uncontended case. */ - if (likely(cmpxchg(lock, node, NULL) == node)) + if (likely(atomic_cmpxchg(&lock->tail, curr, OSQ_UNLOCKED_VAL) == curr)) return; /* diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index c99dc0052f49..74356dc0ce29 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -121,9 +121,10 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) struct optimistic_spin_node { struct optimistic_spin_node *next, *prev; int locked; /* 1 if lock acquired */ + int cpu; /* encoded CPU # value */ }; -extern bool osq_lock(struct optimistic_spin_node **lock); -extern void osq_unlock(struct optimistic_spin_node **lock); +extern bool osq_lock(struct optimistic_spin_queue *lock); +extern void osq_unlock(struct optimistic_spin_queue *lock); #endif /* __LINUX_MCS_SPINLOCK_H */ diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index bc73d33c6760..d9b313906caa 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -60,7 +60,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) INIT_LIST_HEAD(&lock->wait_list); mutex_clear_owner(lock); #ifdef CONFIG_MUTEX_SPIN_ON_OWNER - lock->osq = NULL; + atomic_set(&lock->osq.tail, OSQ_UNLOCKED_VAL); #endif debug_mutex_init(lock, name, key); diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index c40c7d28661d..b77a6230bbf6 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -84,7 +84,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, INIT_LIST_HEAD(&sem->wait_list); #ifdef CONFIG_SMP sem->owner = NULL; - sem->osq = NULL; + atomic_set(&sem->osq.tail, OSQ_UNLOCKED_VAL); #endif } From 4d9d951e6b5df85ccfca2c5bd8b4f5c71d256b65 Mon Sep 17 00:00:00 2001 From: Jason Low Date: Mon, 14 Jul 2014 10:27:50 -0700 Subject: [PATCH 246/455] locking/spinlocks/mcs: Introduce and use init macro and function for osq locks Currently, we initialize the osq lock by directly setting the lock's values. It would be preferable if we use an init macro to do the initialization like we do with other locks. This patch introduces and uses a macro and function for initializing the osq lock. Signed-off-by: Jason Low Signed-off-by: Peter Zijlstra Cc: Scott Norton Cc: "Paul E. McKenney" Cc: Dave Chinner Cc: Waiman Long Cc: Davidlohr Bueso Cc: Rik van Riel Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Steven Rostedt Cc: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Aswin Chandramouleeswaran Cc: Linus Torvalds Cc: Chris Mason Cc: Josef Bacik Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar --- include/linux/osq_lock.h | 8 ++++++++ include/linux/rwsem.h | 2 +- kernel/locking/mutex.c | 2 +- kernel/locking/rwsem-xadd.c | 2 +- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h index b001682bf7cb..90230d5811c5 100644 --- a/include/linux/osq_lock.h +++ b/include/linux/osq_lock.h @@ -16,4 +16,12 @@ struct optimistic_spin_queue { atomic_t tail; }; +/* Init macro and function. */ +#define OSQ_LOCK_UNLOCKED { ATOMIC_INIT(OSQ_UNLOCKED_VAL) } + +static inline void osq_lock_init(struct optimistic_spin_queue *lock) +{ + atomic_set(&lock->tail, OSQ_UNLOCKED_VAL); +} + #endif diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 9fdcdd03507d..25cd9aa2f3d7 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -69,7 +69,7 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \ LIST_HEAD_INIT((name).wait_list), \ NULL, /* owner */ \ - { ATOMIC_INIT(OSQ_UNLOCKED_VAL) } /* osq */ \ + OSQ_LOCK_UNLOCKED /* osq */ \ __RWSEM_DEP_MAP_INIT(name) } #else #define __RWSEM_INITIALIZER(name) \ diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index d9b313906caa..acca2c1a3c5e 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -60,7 +60,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) INIT_LIST_HEAD(&lock->wait_list); mutex_clear_owner(lock); #ifdef CONFIG_MUTEX_SPIN_ON_OWNER - atomic_set(&lock->osq.tail, OSQ_UNLOCKED_VAL); + osq_lock_init(&lock->osq); #endif debug_mutex_init(lock, name, key); diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index b77a6230bbf6..7190592c2645 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -84,7 +84,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, INIT_LIST_HEAD(&sem->wait_list); #ifdef CONFIG_SMP sem->owner = NULL; - atomic_set(&sem->osq.tail, OSQ_UNLOCKED_VAL); + osq_lock_init(&sem->osq); #endif } From 33ecd2083a9560fbc1ef1b1279ef3ecb4c012a4f Mon Sep 17 00:00:00 2001 From: Jason Low Date: Mon, 14 Jul 2014 10:27:51 -0700 Subject: [PATCH 247/455] locking/spinlocks/mcs: Micro-optimize osq_unlock() In the unlock function of the cancellable MCS spinlock, the first thing we do is to retrive the current CPU's osq node. However, due to the changes made in the previous patch, in the common case where the lock is not contended, we wouldn't need to access the current CPU's osq node anymore. This patch optimizes this by only retriving this CPU's osq node after we attempt the initial cmpxchg to unlock the osq and found that its contended. Signed-off-by: Jason Low Signed-off-by: Peter Zijlstra Cc: Scott Norton Cc: "Paul E. McKenney" Cc: Dave Chinner Cc: Waiman Long Cc: Davidlohr Bueso Cc: Rik van Riel Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Steven Rostedt Cc: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Aswin Chandramouleeswaran Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar --- kernel/locking/mcs_spinlock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c index 32fc16c0a545..be9ee1559fca 100644 --- a/kernel/locking/mcs_spinlock.c +++ b/kernel/locking/mcs_spinlock.c @@ -182,8 +182,7 @@ unqueue: void osq_unlock(struct optimistic_spin_queue *lock) { - struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); - struct optimistic_spin_node *next; + struct optimistic_spin_node *node, *next; int curr = encode_cpu(smp_processor_id()); /* @@ -195,6 +194,7 @@ void osq_unlock(struct optimistic_spin_queue *lock) /* * Second most likely case. */ + node = this_cpu_ptr(&osq_node); next = xchg(&node->next, NULL); if (next) { ACCESS_ONCE(next->locked) = 1; From 4485154138f6ffa5b252cb490aba3e8eb30124e4 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 30 Jun 2014 16:04:08 -0700 Subject: [PATCH 248/455] perf/x86/intel: Avoid spamming kernel log for BTS buffer failure It's unnecessary to excessively spam the kernel log anytime the BTS buffer cannot be allocated, so make this allocation __GFP_NOWARN. The user probably will want to at least find some artifact that the allocation has failed in the past, probably due to fragmentation because of its large size, when it's not allocated at bootstrap. Thus, add a WARN_ONCE() so something is left behind for them to understand why perf commnads that require PEBS is not working properly. Signed-off-by: David Rientjes Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1406301600460.26302@chino.kir.corp.google.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event_intel_ds.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index 980970cb744d..696ade311ded 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -311,9 +311,11 @@ static int alloc_bts_buffer(int cpu) if (!x86_pmu.bts) return 0; - buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL, node); - if (unlikely(!buffer)) + buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node); + if (unlikely(!buffer)) { + WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__); return -ENOMEM; + } max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE; thresh = max / 16; From b0ab99e7736af88b8ac1b7ae50ea287fffa2badc Mon Sep 17 00:00:00 2001 From: Mateusz Guzik Date: Sat, 14 Jun 2014 15:00:09 +0200 Subject: [PATCH 249/455] sched: Fix possible divide by zero in avg_atom() calculation proc_sched_show_task() does: if (nr_switches) do_div(avg_atom, nr_switches); nr_switches is unsigned long and do_div truncates it to 32 bits, which means it can test non-zero on e.g. x86-64 and be truncated to zero for division. Fix the problem by using div64_ul() instead. As a side effect calculations of avg_atom for big nr_switches are now correct. Signed-off-by: Mateusz Guzik Signed-off-by: Peter Zijlstra Cc: stable@vger.kernel.org Cc: Linus Torvalds Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.com Signed-off-by: Ingo Molnar --- kernel/sched/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 695f9773bb60..627b3c34b821 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -608,7 +608,7 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) avg_atom = p->se.sum_exec_runtime; if (nr_switches) - do_div(avg_atom, nr_switches); + avg_atom = div64_ul(avg_atom, nr_switches); else avg_atom = -1LL; From 4cf465b579c20bee868464f5d664f8d2d96cd370 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 16 Jul 2014 13:28:34 +0200 Subject: [PATCH 250/455] ACPI / video: Add use_native_backlight quirk for HP ProBook 4540s As reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1025690 This is yet another model which needs this quirk. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1025690 Signed-off-by: Hans de Goede Signed-off-by: Rafael J. Wysocki --- drivers/acpi/video.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/acpi/video.c b/drivers/acpi/video.c index 29649c194886..350d52a8f781 100644 --- a/drivers/acpi/video.c +++ b/drivers/acpi/video.c @@ -581,6 +581,14 @@ static struct dmi_system_id video_dmi_table[] __initdata = { }, { .callback = video_set_use_native_backlight, + .ident = "HP ProBook 4540s", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_VERSION, "HP ProBook 4540s"), + }, + }, + { + .callback = video_set_use_native_backlight, .ident = "HP ProBook 2013 models", .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), From 0cdd192cf40fb6dbf03ec3af1c670068de3fd26c Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Fri, 11 Jul 2014 10:27:01 -0700 Subject: [PATCH 251/455] kprobes/x86: Don't try to resolve kprobe faults from userspace This commit: commit 6f6343f53d133bae516caf3d254bce37d8774625 Author: Masami Hiramatsu Date: Thu Apr 17 17:17:33 2014 +0900 kprobes/x86: Call exception handlers directly from do_int3/do_debug appears to have inadvertently dropped a check that the int3 came from kernel mode. Trying to dereference addr when addr is user-controlled is completely bogus. Signed-off-by: Andy Lutomirski Acked-by: Masami Hiramatsu Link: http://lkml.kernel.org/r/c4e339882c121aa76254f2adde3fcbdf502faec2.1405099506.git.luto@amacapital.net Signed-off-by: Ingo Molnar --- arch/x86/kernel/kprobes/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 7596df664901..67e6d19ef1be 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -574,6 +574,9 @@ int kprobe_int3_handler(struct pt_regs *regs) struct kprobe *p; struct kprobe_ctlblk *kcb; + if (user_mode_vm(regs)) + return 0; + addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t)); /* * We don't want to be preempted for the entire From 97d496bf1a7934c77f8bb0de9b773dd759def616 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Sat, 12 Jul 2014 11:43:33 +0200 Subject: [PATCH 252/455] cpufreq: sa1110: set memory type for h3600 The Compaq iPAQ h3600 also has the K4S281632b-1H memory type. Verified by prying apart a broken board. Signed-off-by: Linus Walleij Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/sa1110-cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/sa1110-cpufreq.c b/drivers/cpufreq/sa1110-cpufreq.c index 546376719d8f..b5befc211172 100644 --- a/drivers/cpufreq/sa1110-cpufreq.c +++ b/drivers/cpufreq/sa1110-cpufreq.c @@ -349,7 +349,7 @@ static int __init sa1110_clk_init(void) name = "K4S641632D"; if (machine_is_h3100()) name = "KM416S4030CT"; - if (machine_is_jornada720()) + if (machine_is_jornada720() || machine_is_h3600()) name = "K4S281632B-1H"; if (machine_is_nanoengine()) name = "MT48LC8M16A2TG-75"; From 7e02168711e70a93a4795870e028a260702bdcb5 Mon Sep 17 00:00:00 2001 From: Nicolas Del Piano Date: Sun, 13 Jul 2014 18:59:00 -0300 Subject: [PATCH 253/455] cpufreq: imx6q: Select PM_OPP PM_OPP is a library used by several of the existing cpufreq drivers. ARM IMX6Q cpufreq driver uses this library for its functionality. Thus, it should be selected in Kconfig. Reported-by: Ezequiel Garcia Signed-off-by: Nicolas Del Piano Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/Kconfig.arm | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm index ebac67115009..8c5cf4bdeab8 100644 --- a/drivers/cpufreq/Kconfig.arm +++ b/drivers/cpufreq/Kconfig.arm @@ -104,6 +104,7 @@ config ARM_IMX6Q_CPUFREQ tristate "Freescale i.MX6 cpufreq support" depends on ARCH_MXC depends on REGULATOR_ANATOP + select PM_OPP help This adds cpufreq driver support for Freescale i.MX6 series SoCs. From 13b9a962a2594ee880c5d50d7f70964da1d4fe5a Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 16 Jul 2014 14:54:55 +0200 Subject: [PATCH 254/455] locking/rwsem: Rename 'activity' to 'count' There are two definitions of struct rw_semaphore, one in linux/rwsem.h and one in linux/rwsem-spinlock.h. For some reason they have different names for the initial field. This makes it impossible to use C99 named initialization for __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing along with the structure definitions. The simpler patch is renaming the rwsem-spinlock variant to match the regular rwsem. This allows us to switch to C99 named initialization. Signed-off-by: Peter Zijlstra Cc: Linus Torvalds Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org Signed-off-by: Ingo Molnar --- include/linux/rwsem-spinlock.h | 8 ++++---- kernel/locking/rwsem-spinlock.c | 28 ++++++++++++++-------------- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/linux/rwsem-spinlock.h b/include/linux/rwsem-spinlock.h index d5b13bc07a0b..561e8615528d 100644 --- a/include/linux/rwsem-spinlock.h +++ b/include/linux/rwsem-spinlock.h @@ -15,13 +15,13 @@ #ifdef __KERNEL__ /* * the rw-semaphore definition - * - if activity is 0 then there are no active readers or writers - * - if activity is +ve then that is the number of active readers - * - if activity is -1 then there is one active writer + * - if count is 0 then there are no active readers or writers + * - if count is +ve then that is the number of active readers + * - if count is -1 then there is one active writer * - if wait_list is not empty, then there are processes waiting for the semaphore */ struct rw_semaphore { - __s32 activity; + __s32 count; raw_spinlock_t wait_lock; struct list_head wait_list; #ifdef CONFIG_DEBUG_LOCK_ALLOC diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c index 9be8a9144978..2c93571162cb 100644 --- a/kernel/locking/rwsem-spinlock.c +++ b/kernel/locking/rwsem-spinlock.c @@ -26,7 +26,7 @@ int rwsem_is_locked(struct rw_semaphore *sem) unsigned long flags; if (raw_spin_trylock_irqsave(&sem->wait_lock, flags)) { - ret = (sem->activity != 0); + ret = (sem->count != 0); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); } return ret; @@ -46,7 +46,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, debug_check_no_locks_freed((void *)sem, sizeof(*sem)); lockdep_init_map(&sem->dep_map, name, key, 0); #endif - sem->activity = 0; + sem->count = 0; raw_spin_lock_init(&sem->wait_lock); INIT_LIST_HEAD(&sem->wait_list); } @@ -95,7 +95,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wakewrite) waiter = list_entry(next, struct rwsem_waiter, list); } while (waiter->type != RWSEM_WAITING_FOR_WRITE); - sem->activity += woken; + sem->count += woken; out: return sem; @@ -126,9 +126,9 @@ void __sched __down_read(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); - if (sem->activity >= 0 && list_empty(&sem->wait_list)) { + if (sem->count >= 0 && list_empty(&sem->wait_list)) { /* granted */ - sem->activity++; + sem->count++; raw_spin_unlock_irqrestore(&sem->wait_lock, flags); goto out; } @@ -170,9 +170,9 @@ int __down_read_trylock(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); - if (sem->activity >= 0 && list_empty(&sem->wait_list)) { + if (sem->count >= 0 && list_empty(&sem->wait_list)) { /* granted */ - sem->activity++; + sem->count++; ret = 1; } @@ -206,7 +206,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass) * itself into sleep and waiting for system woke it or someone * else in the head of the wait list up. */ - if (sem->activity == 0) + if (sem->count == 0) break; set_task_state(tsk, TASK_UNINTERRUPTIBLE); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); @@ -214,7 +214,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass) raw_spin_lock_irqsave(&sem->wait_lock, flags); } /* got the lock */ - sem->activity = -1; + sem->count = -1; list_del(&waiter.list); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); @@ -235,9 +235,9 @@ int __down_write_trylock(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); - if (sem->activity == 0) { + if (sem->count == 0) { /* got the lock */ - sem->activity = -1; + sem->count = -1; ret = 1; } @@ -255,7 +255,7 @@ void __up_read(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); - if (--sem->activity == 0 && !list_empty(&sem->wait_list)) + if (--sem->count == 0 && !list_empty(&sem->wait_list)) sem = __rwsem_wake_one_writer(sem); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); @@ -270,7 +270,7 @@ void __up_write(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); - sem->activity = 0; + sem->count = 0; if (!list_empty(&sem->wait_list)) sem = __rwsem_do_wake(sem, 1); @@ -287,7 +287,7 @@ void __downgrade_write(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); - sem->activity = 1; + sem->count = 1; if (!list_empty(&sem->wait_list)) sem = __rwsem_do_wake(sem, 0); From ce069fc920e5734558b3d9cbef1ab06cf01ee793 Mon Sep 17 00:00:00 2001 From: Jason Low Date: Mon, 14 Jul 2014 10:27:52 -0700 Subject: [PATCH 255/455] locking/rwsem: Reduce the size of struct rw_semaphore Recent optimistic spinning additions to rwsem provide significant performance benefits on many workloads on large machines. The cost of it was increasing the size of the rwsem structure by up to 128 bits. However, now that the previous patches in this series bring the overhead of struct optimistic_spin_queue to 32 bits, this patch reorders some fields in struct rw_semaphore such that we can reduce the overhead of the rwsem structure by 64 bits (on 64 bit systems). The extra overhead required for rwsem optimistic spinning would now be up to 8 additional bytes instead of up to 16 bytes. Additionally, the size of rwsem would now be more in line with mutexes. Signed-off-by: Jason Low Signed-off-by: Peter Zijlstra Cc: Scott Norton Cc: "Paul E. McKenney" Cc: Dave Chinner Cc: Waiman Long Cc: Davidlohr Bueso Cc: Rik van Riel Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Steven Rostedt Cc: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Aswin Chandramouleeswaran Cc: Linus Torvalds Cc: Chris Mason Cc: Josef Bacik Link: http://lkml.kernel.org/r/1405358872-3732-6-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar --- include/linux/rwsem.h | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 25cd9aa2f3d7..716807f0eb2d 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -24,15 +24,15 @@ struct rw_semaphore; /* All arch specific implementations share the same struct */ struct rw_semaphore { long count; - raw_spinlock_t wait_lock; struct list_head wait_list; + raw_spinlock_t wait_lock; #ifdef CONFIG_SMP + struct optimistic_spin_queue osq; /* spinner MCS lock */ /* * Write owner. Used as a speculative check to see * if the owner is running on the cpu. */ struct task_struct *owner; - struct optimistic_spin_queue osq; /* spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; @@ -64,21 +64,18 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) #endif #if defined(CONFIG_SMP) && !defined(CONFIG_RWSEM_GENERIC_SPINLOCK) -#define __RWSEM_INITIALIZER(name) \ - { RWSEM_UNLOCKED_VALUE, \ - __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \ - LIST_HEAD_INIT((name).wait_list), \ - NULL, /* owner */ \ - OSQ_LOCK_UNLOCKED /* osq */ \ - __RWSEM_DEP_MAP_INIT(name) } +#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL #else -#define __RWSEM_INITIALIZER(name) \ - { RWSEM_UNLOCKED_VALUE, \ - __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \ - LIST_HEAD_INIT((name).wait_list) \ - __RWSEM_DEP_MAP_INIT(name) } +#define __RWSEM_OPT_INIT(lockname) #endif +#define __RWSEM_INITIALIZER(name) \ + { .count = RWSEM_UNLOCKED_VALUE, \ + .wait_list = LIST_HEAD_INIT((name).wait_list), \ + .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock) \ + __RWSEM_OPT_INIT(name) \ + __RWSEM_DEP_MAP_INIT(name) } + #define DECLARE_RWSEM(name) \ struct rw_semaphore name = __RWSEM_INITIALIZER(name) From 4badad352a6bb202ec68afa7a574c0bb961e5ebc Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 6 Jun 2014 19:53:16 +0200 Subject: [PATCH 256/455] locking/mutex: Disable optimistic spinning on some architectures The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra Reported-by: Mikulas Patocka Cc: David Miller Cc: Chris Metcalf Cc: James Bottomley Cc: Vineet Gupta Cc: Jason Low Cc: Waiman Long Cc: "James E.J. Bottomley" Cc: Paul McKenney Cc: John David Anglin Cc: James Hogan Cc: Linus Torvalds Cc: Davidlohr Bueso Cc: stable@vger.kernel.org Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Russell King Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + kernel/Kconfig.locks | 5 ++++- 6 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 245058b3b0ef..88acf8bc1490 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -6,6 +6,7 @@ config ARM select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_MIGHT_HAVE_PC_PARPORT + select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF select ARCH_WANT_IPC_PARSE_VERSION diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a474de346be6..839f48c26ef0 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -4,6 +4,7 @@ config ARM64 select ARCH_HAS_OPP select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_WANT_OPTIONAL_GPIOLIB select ARCH_WANT_COMPAT_IPC_PARSE_VERSION select ARCH_WANT_FRAME_POINTERS diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index fefe7c8bf05f..80b94b0add1f 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -145,6 +145,7 @@ config PPC select HAVE_IRQ_EXIT_ON_IRQ_STACK select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select HAVE_ARCH_AUDITSYSCALL + select ARCH_SUPPORTS_ATOMIC_RMW config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 29f2e988c56a..407c87d9879a 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -78,6 +78,7 @@ config SPARC64 select HAVE_C_RECORDMCOUNT select NO_BOOTMEM select HAVE_ARCH_AUDITSYSCALL + select ARCH_SUPPORTS_ATOMIC_RMW config ARCH_DEFCONFIG string diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8f749ef0fdc..d24887b645dc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -131,6 +131,7 @@ config X86 select HAVE_CC_STACKPROTECTOR select GENERIC_CPU_AUTOPROBE select HAVE_ARCH_AUDITSYSCALL + select ARCH_SUPPORTS_ATOMIC_RMW config INSTRUCTION_DECODER def_bool y diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 35536d9c0964..81907941d921 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -220,9 +220,12 @@ config INLINE_WRITE_UNLOCK_IRQRESTORE endif +config ARCH_SUPPORTS_ATOMIC_RMW + bool + config MUTEX_SPIN_ON_OWNER def_bool y - depends on SMP && !DEBUG_MUTEXES + depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW config ARCH_USE_QUEUE_RWLOCK bool From 5db6c6fefb1ca0e81e3bd6dd8998bf51c453d823 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Fri, 11 Jul 2014 14:00:06 -0700 Subject: [PATCH 257/455] locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER), encapsulate the dependencies for rwsem optimistic spinning. No logical changes here as it continues to depend on both SMP and the XADD algorithm variant. Signed-off-by: Davidlohr Bueso Acked-by: Jason Low [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ] Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com Cc: aswin@hp.com Cc: Chris Mason Cc: Davidlohr Bueso Cc: Josef Bacik Cc: Linus Torvalds Cc: Waiman Long Signed-off-by: Ingo Molnar Signed-off-by: Ingo Molnar --- include/linux/rwsem.h | 6 ++++-- kernel/Kconfig.locks | 4 ++++ kernel/locking/rwsem-xadd.c | 4 ++-- kernel/locking/rwsem.c | 2 +- 4 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 716807f0eb2d..035d3c57fc8a 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -14,7 +14,9 @@ #include #include #include +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER #include +#endif struct rw_semaphore; @@ -26,7 +28,7 @@ struct rw_semaphore { long count; struct list_head wait_list; raw_spinlock_t wait_lock; -#ifdef CONFIG_SMP +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER struct optimistic_spin_queue osq; /* spinner MCS lock */ /* * Write owner. Used as a speculative check to see @@ -63,7 +65,7 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem) # define __RWSEM_DEP_MAP_INIT(lockname) #endif -#if defined(CONFIG_SMP) && !defined(CONFIG_RWSEM_GENERIC_SPINLOCK) +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER #define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL #else #define __RWSEM_OPT_INIT(lockname) diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 81907941d921..76768ee812b2 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER def_bool y depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW +config RWSEM_SPIN_ON_OWNER + def_bool y + depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW + config ARCH_USE_QUEUE_RWLOCK bool diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 7190592c2645..a2391ac135c8 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -82,7 +82,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, sem->count = RWSEM_UNLOCKED_VALUE; raw_spin_lock_init(&sem->wait_lock); INIT_LIST_HEAD(&sem->wait_list); -#ifdef CONFIG_SMP +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER sem->owner = NULL; osq_lock_init(&sem->osq); #endif @@ -262,7 +262,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem) return false; } -#ifdef CONFIG_SMP +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER /* * Try to acquire write lock before the writer has been put on wait queue. */ diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 42f806de49d4..e2d3bc7f03b4 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -12,7 +12,7 @@ #include -#if defined(CONFIG_SMP) && defined(CONFIG_RWSEM_XCHGADD_ALGORITHM) +#ifdef CONFIG_RWSEM_SPIN_ON_OWNER static inline void rwsem_set_owner(struct rw_semaphore *sem) { sem->owner = current; From 2fa1adc07089020984d52bb55c5af0b9d86dff21 Mon Sep 17 00:00:00 2001 From: Quentin Armitage Date: Thu, 10 Jul 2014 11:15:55 +0100 Subject: [PATCH 258/455] cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD Commit ff1f0018cf66080d8e6f59791e552615648a033a ("drivers: Enable building of Kirkwood drivers for mach-mvebu") added Kirkwood into mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files. The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD || MACH_KIRKWOOD. As a consequence of this change, the cpufreq driver is no longer enabled for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for ARCH_KIRKWOOD. Signed-off-by: Quentin Armitage Acked-by: Andrew Lunn Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/Kconfig.arm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm index 8c5cf4bdeab8..7364a538e056 100644 --- a/drivers/cpufreq/Kconfig.arm +++ b/drivers/cpufreq/Kconfig.arm @@ -119,7 +119,7 @@ config ARM_INTEGRATOR If in doubt, say Y. config ARM_KIRKWOOD_CPUFREQ - def_bool MACH_KIRKWOOD + def_bool ARCH_KIRKWOOD || MACH_KIRKWOOD help This adds the CPUFreq driver for Marvell Kirkwood SoCs. From 1bf8cc3d017575a38d4361f56ccc0a670a16bcd9 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 11 Jul 2014 20:24:19 +0530 Subject: [PATCH 259/455] cpufreq: cpu0: OPPs can be populated at runtime OPPs can be populated statically, via DT, or added at run time with dev_pm_opp_add(). While this driver handles the first case correctly, it would fail to populate OPPs added at runtime. Because call to of_init_opp_table() would fail as there are no OPPs in DT and probe will return early. To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table() unconditionally. Update bindings as well. Suggested-by: Stephen Boyd Tested-by: Stephen Boyd Signed-off-by: Viresh Kumar Acked-by: Santosh Shilimkar Signed-off-by: Rafael J. Wysocki --- Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt | 6 ++++-- drivers/cpufreq/cpufreq-cpu0.c | 7 ++----- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt index f055515d2b62..366690cb86a3 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt @@ -8,10 +8,12 @@ Both required and optional properties listed below must be defined under node /cpus/cpu@0. Required properties: -- operating-points: Refer to Documentation/devicetree/bindings/power/opp.txt - for details +- None Optional properties: +- operating-points: Refer to Documentation/devicetree/bindings/power/opp.txt for + details. OPPs *must* be supplied either via DT, i.e. this property, or + populated at runtime. - clock-latency: Specify the possible maximum transition latency for clock, in unit of nanoseconds. - voltage-tolerance: Specify the CPU voltage tolerance in percentage. diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c index ee1ae303a07c..86beda9f950b 100644 --- a/drivers/cpufreq/cpufreq-cpu0.c +++ b/drivers/cpufreq/cpufreq-cpu0.c @@ -152,11 +152,8 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev) goto out_put_reg; } - ret = of_init_opp_table(cpu_dev); - if (ret) { - pr_err("failed to init OPP table: %d\n", ret); - goto out_put_clk; - } + /* OPPs might be populated at runtime, don't check for error here */ + of_init_opp_table(cpu_dev); ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table); if (ret) { From c3caf1192f904de2f1381211f564537235d50de3 Mon Sep 17 00:00:00 2001 From: Jerry Chu Date: Mon, 14 Jul 2014 15:54:46 -0700 Subject: [PATCH 260/455] net-gre-gro: Fix a bug that breaks the forwarding path Fixed a bug that was introduced by my GRE-GRO patch (bf5a755f5e9186406bbf50f4087100af5bd68e40 net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: H.K. Jerry Chu Signed-off-by: David S. Miller --- net/core/dev.c | 2 ++ net/ipv4/af_inet.c | 3 +++ net/ipv4/gre_offload.c | 3 +++ net/ipv4/tcp_offload.c | 2 +- net/ipv6/tcpv6_offload.c | 2 +- 5 files changed, 10 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 7990984ca364..367a586d0c8a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4096,6 +4096,8 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->vlan_tci = 0; skb->dev = napi->dev; skb->skb_iif = 0; + skb->encapsulation = 0; + skb_shinfo(skb)->gso_type = 0; skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); napi->skb = skb; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index d5e6836cf772..d156b3c5f363 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1429,6 +1429,9 @@ static int inet_gro_complete(struct sk_buff *skb, int nhoff) int proto = iph->protocol; int err = -ENOSYS; + if (skb->encapsulation) + skb_set_inner_network_header(skb, nhoff); + csum_replace2(&iph->check, iph->tot_len, newlen); iph->tot_len = newlen; diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index eb92deb12666..f0bdd47bbbcb 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -263,6 +263,9 @@ static int gre_gro_complete(struct sk_buff *skb, int nhoff) int err = -ENOENT; __be16 type; + skb->encapsulation = 1; + skb_shinfo(skb)->gso_type = SKB_GSO_GRE; + type = greh->protocol; if (greh->flags & GRE_KEY) grehlen += GRE_HEADER_SECTION; diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 4e86c59ec7f7..55046ecd083e 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -309,7 +309,7 @@ static int tcp4_gro_complete(struct sk_buff *skb, int thoff) th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr, iph->daddr, 0); - skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4; return tcp_gro_complete(skb); } diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index 8517d3cd1aed..01b0ff9a0c2c 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -73,7 +73,7 @@ static int tcp6_gro_complete(struct sk_buff *skb, int thoff) th->check = ~tcp_v6_check(skb->len - thoff, &iph->saddr, &iph->daddr, 0); - skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6; + skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV6; return tcp_gro_complete(skb); } From fbb60fe35ad579b511de8604b06a30b43846473b Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Mon, 12 May 2014 16:35:39 +0800 Subject: [PATCH 261/455] drm/qxl: return IRQ_NONE if it was not our irq Return IRQ_NONE if it was not our irq. This is necessary for the case when qxl is sharing irq line with a device A in a crash kernel. If qxl is initialized before A and A's irq was raised during this gap, returning IRQ_HANDLED in this case will cause this irq to be raised again after EOI since kernel think it was handled but in fact it was not. Cc: Gerd Hoffmann Cc: stable@vger.kernel.org Signed-off-by: Jason Wang Signed-off-by: Dave Airlie --- drivers/gpu/drm/qxl/qxl_irq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_irq.c b/drivers/gpu/drm/qxl/qxl_irq.c index 34d6a85e9023..0bf1e20c6e44 100644 --- a/drivers/gpu/drm/qxl/qxl_irq.c +++ b/drivers/gpu/drm/qxl/qxl_irq.c @@ -33,6 +33,9 @@ irqreturn_t qxl_irq_handler(int irq, void *arg) pending = xchg(&qdev->ram_header->int_pending, 0); + if (!pending) + return IRQ_NONE; + atomic_inc(&qdev->irq_received); if (pending & QXL_INTERRUPT_DISPLAY) { From de12d6f4b10b21854441f5242dcb29ea96181e58 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 16 Jul 2014 17:40:31 -0700 Subject: [PATCH 262/455] hwmon: (adt7470) Fix writes to temperature limit registers Temperature limit registers are signed. Limits therefore need to be clamped to (-128, 127) degrees C and not to (0, 255) degrees C. Without this fix, writing a limit of 128 degrees C sets the actual limit to -128 degrees C. Signed-off-by: Guenter Roeck Cc: stable@vger.kernel.org Reviewed-by: Axel Lin --- drivers/hwmon/adt7470.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/hwmon/adt7470.c b/drivers/hwmon/adt7470.c index 0f4dea5ccf17..9ee3913850d6 100644 --- a/drivers/hwmon/adt7470.c +++ b/drivers/hwmon/adt7470.c @@ -515,7 +515,7 @@ static ssize_t set_temp_min(struct device *dev, return -EINVAL; temp = DIV_ROUND_CLOSEST(temp, 1000); - temp = clamp_val(temp, 0, 255); + temp = clamp_val(temp, -128, 127); mutex_lock(&data->lock); data->temp_min[attr->index] = temp; @@ -549,7 +549,7 @@ static ssize_t set_temp_max(struct device *dev, return -EINVAL; temp = DIV_ROUND_CLOSEST(temp, 1000); - temp = clamp_val(temp, 0, 255); + temp = clamp_val(temp, -128, 127); mutex_lock(&data->lock); data->temp_max[attr->index] = temp; @@ -826,7 +826,7 @@ static ssize_t set_pwm_tmin(struct device *dev, return -EINVAL; temp = DIV_ROUND_CLOSEST(temp, 1000); - temp = clamp_val(temp, 0, 255); + temp = clamp_val(temp, -128, 127); mutex_lock(&data->lock); data->pwm_tmin[attr->index] = temp; From 7a9810e7bd99c922d9cedf64dbaa5ef6be412295 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Thu, 17 Jul 2014 12:55:40 +0900 Subject: [PATCH 263/455] r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ethernet port on my ASUS A88X Pro mainboard stopped working several times a day, with messages like these in dmesg: AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x001e address=0x0000000000003000 flags=0x0050] Searching the web for these messages led me to similar reports about different hardware supported by r8169, and eventually to commits 3ced8c955e74d319f3e3997f7169c79d524dfd06 ('r8169: enforce RX_MULTI_EN for the 8168f.') and eb2dc35d99028b698cdedba4f5522bc43e576bd2 ('r8169: RxConfig hack for the 8168evl'). So I tried this change, and it fixes the problem for me. Signed-off-by: Michel Dänzer Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/r8169.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 06bdc31a828d..61623e9af574 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -4240,6 +4240,8 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp) RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); break; case RTL_GIGA_MAC_VER_40: + RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF); + break; case RTL_GIGA_MAC_VER_41: case RTL_GIGA_MAC_VER_42: case RTL_GIGA_MAC_VER_43: From a4b70a07ed12a71131cab7adce2ce91c71b37060 Mon Sep 17 00:00:00 2001 From: Sowmini Varadhan Date: Wed, 16 Jul 2014 10:02:26 -0400 Subject: [PATCH 264/455] sunvnet: clean up objects created in vnet_new() on vnet_exit() Nothing cleans up the objects created by vnet_new(), they are completely leaked. vnet_exit(), after doing the vio_unregister_driver() to clean up ports, should call a helper function that iterates over vnet_list and cleans up those objects. This includes unregister_netdevice() as well as free_netdev(). Signed-off-by: Sowmini Varadhan Acked-by: Dave Kleikamp Reviewed-by: Karl Volz Signed-off-by: David S. Miller --- drivers/net/ethernet/sun/sunvnet.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c index 1c24a8f368bd..fd411d6e19a2 100644 --- a/drivers/net/ethernet/sun/sunvnet.c +++ b/drivers/net/ethernet/sun/sunvnet.c @@ -1083,6 +1083,24 @@ static struct vnet *vnet_find_or_create(const u64 *local_mac) return vp; } +static void vnet_cleanup(void) +{ + struct vnet *vp; + struct net_device *dev; + + mutex_lock(&vnet_list_mutex); + while (!list_empty(&vnet_list)) { + vp = list_first_entry(&vnet_list, struct vnet, list); + list_del(&vp->list); + dev = vp->dev; + /* vio_unregister_driver() should have cleaned up port_list */ + BUG_ON(!list_empty(&vp->port_list)); + unregister_netdev(dev); + free_netdev(dev); + } + mutex_unlock(&vnet_list_mutex); +} + static const char *local_mac_prop = "local-mac-address"; static struct vnet *vnet_find_parent(struct mdesc_handle *hp, @@ -1240,7 +1258,6 @@ static int vnet_port_remove(struct vio_dev *vdev) kfree(port); - unregister_netdev(vp->dev); } return 0; } @@ -1268,6 +1285,7 @@ static int __init vnet_init(void) static void __exit vnet_exit(void) { vio_unregister_driver(&vnet_port_driver); + vnet_cleanup(); } module_init(vnet_init); From 652c1a05171695d21b84dd3a723606b50eeb80fd Mon Sep 17 00:00:00 2001 From: Or Gerlitz Date: Wed, 25 Jun 2014 16:44:14 +0300 Subject: [PATCH 265/455] IB/mlx5: Enable "block multicast loopback" for kernel consumers In commit f360d88a2efd, we advertise blocking multicast loopback to both kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB) to use it with their UD QPs. Fix that. Fixes: f360d88a2efd ("IB/mlx5: Add block multicast loopback support") Reported-by: Haggai Eran Signed-off-by: Eli Cohen Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx5/qp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index d13ddf1c0033..bbbcf389272c 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -675,7 +675,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev, int err; uuari = &dev->mdev.priv.uuari; - if (init_attr->create_flags & ~IB_QP_CREATE_SIGNATURE_EN) + if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN | IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)) return -EINVAL; if (init_attr->qp_type == MLX5_IB_QPT_REG_UMR) From 858e6c321065344339906672bccd0eafe9622258 Mon Sep 17 00:00:00 2001 From: Amir Vadai Date: Wed, 16 Jul 2014 13:33:50 +0300 Subject: [PATCH 266/455] net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's Fix a regression introduced by commit 35f6f45 ("net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map"). When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc was NULL. This caused a kernel crash under heavy traffic - when having more than rx NAPI budget completions. Fixed to have it set for both EQ modes. Signed-off-by: Amir Vadai Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index 14c00048bbec..82322b1c8411 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -129,14 +129,15 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, name); } - cq->irq_desc = - irq_to_desc(mlx4_eq_get_irq(mdev->dev, - cq->vector)); } } else { cq->vector = (cq->ring + 1 + priv->port) % mdev->dev->caps.num_comp_vectors; } + + cq->irq_desc = + irq_to_desc(mlx4_eq_get_irq(mdev->dev, + cq->vector)); } else { /* For TX we use the same irq per ring we assigned for the RX */ From cc25eaae238ddd693aa5eaa73e565d8ff4915f6e Mon Sep 17 00:00:00 2001 From: Christoph Schulz Date: Wed, 16 Jul 2014 22:10:29 +0200 Subject: [PATCH 267/455] net: ppp: fix creating PPP pass and active filters Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") inadvertently changed the logic when setting PPP pass and active filters. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The original code in ppp_ioctl() (or isdn_ppp_ioctl(), resp.) handling PPPIOCSPASS and PPPIOCSACTIVE allowed to remove a pass/active filter previously set by using a filter of length zero. However, with the new code this is not possible anymore as this case is not explicitly checked for, which leads to passing NULL as a filter to sk_unattached_filter_create(). This results in returning EINVAL to the caller. Additionally, the variables ppp->pass_filter and ppp->active_filter (or is->pass_filter and is->active_filter, resp.) are not reset to NULL, although the filters they point to may have been destroyed by sk_unattached_filter_destroy(), so in this EINVAL case dangling pointers are left behind (provided the pointers were previously non-NULL). This patch corrects both problems by checking whether the filter passed is empty or non-empty, and prevents sk_unattached_filter_create() from being called in the first case. Moreover, the pointers are always reset to NULL as soon as sk_unattached_filter_destroy() returns. Signed-off-by: Christoph Schulz Signed-off-by: David S. Miller --- drivers/isdn/i4l/isdn_ppp.c | 20 ++++++++++++++++---- drivers/net/ppp/ppp_generic.c | 22 ++++++++++++++++------ 2 files changed, 32 insertions(+), 10 deletions(-) diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c index a333b7f798d1..62f0688d45a5 100644 --- a/drivers/isdn/i4l/isdn_ppp.c +++ b/drivers/isdn/i4l/isdn_ppp.c @@ -638,9 +638,15 @@ isdn_ppp_ioctl(int min, struct file *file, unsigned int cmd, unsigned long arg) fprog.len = len; fprog.filter = code; - if (is->pass_filter) + if (is->pass_filter) { sk_unattached_filter_destroy(is->pass_filter); - err = sk_unattached_filter_create(&is->pass_filter, &fprog); + is->pass_filter = NULL; + } + if (fprog.filter != NULL) + err = sk_unattached_filter_create(&is->pass_filter, + &fprog); + else + err = 0; kfree(code); return err; @@ -657,9 +663,15 @@ isdn_ppp_ioctl(int min, struct file *file, unsigned int cmd, unsigned long arg) fprog.len = len; fprog.filter = code; - if (is->active_filter) + if (is->active_filter) { sk_unattached_filter_destroy(is->active_filter); - err = sk_unattached_filter_create(&is->active_filter, &fprog); + is->active_filter = NULL; + } + if (fprog.filter != NULL) + err = sk_unattached_filter_create(&is->active_filter, + &fprog); + else + err = 0; kfree(code); return err; diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index e2f20f807de8..d5b77ef3a210 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -757,10 +757,15 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg) }; ppp_lock(ppp); - if (ppp->pass_filter) + if (ppp->pass_filter) { sk_unattached_filter_destroy(ppp->pass_filter); - err = sk_unattached_filter_create(&ppp->pass_filter, - &fprog); + ppp->pass_filter = NULL; + } + if (fprog.filter != NULL) + err = sk_unattached_filter_create(&ppp->pass_filter, + &fprog); + else + err = 0; kfree(code); ppp_unlock(ppp); } @@ -778,10 +783,15 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg) }; ppp_lock(ppp); - if (ppp->active_filter) + if (ppp->active_filter) { sk_unattached_filter_destroy(ppp->active_filter); - err = sk_unattached_filter_create(&ppp->active_filter, - &fprog); + ppp->active_filter = NULL; + } + if (fprog.filter != NULL) + err = sk_unattached_filter_create(&ppp->active_filter, + &fprog); + else + err = 0; kfree(code); ppp_unlock(ppp); } From 92c14bd9477a20a83144f08c0ca25b0308bf0730 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 17 Jul 2014 10:48:25 +0530 Subject: [PATCH 268/455] cpufreq: move policy kobj to policy->cpu at resume This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it. Consider a dual cluster platform with 2 cores per cluster. During suspend we start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj as we want to retain permissions/values/etc. Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu(). But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a link for CPU2, but would try that for CPU3 while bringing it online. Which will report errors as CPU3 already has kobj assigned to it. This bug got introduced with commit 42f921a, which overlooked this scenario. To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here only for the first CPU of a non-boot cluster. And we can't recover from this situation, if kobject_move() fails. Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume) Cc: 3.13+ # 3.13+ Reported-and-tested-by: Bu Yitian Reported-by: Saravana Kannan Reviewed-by: Srivatsa S. Bhat Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62259d27f03e..6f024852c6fb 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1153,10 +1153,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) * the creation of a brand new one. So we need to perform this update * by invoking update_policy_cpu(). */ - if (recover_policy && cpu != policy->cpu) + if (recover_policy && cpu != policy->cpu) { update_policy_cpu(policy, cpu); - else + WARN_ON(kobject_move(&policy->kobj, &dev->kobj)); + } else { policy->cpu = cpu; + } cpumask_copy(policy->cpus, cpumask_of(cpu)); From a2a9e02b5b67a7a32a14ab6c4c331a1a0c23a1db Mon Sep 17 00:00:00 2001 From: Andrey Utkin Date: Thu, 17 Jul 2014 15:13:23 +0300 Subject: [PATCH 269/455] drivers/ata/pata_ep93xx.c: use signed int type for result of platform_get_irq() [linux-3.16-rc5/drivers/ata/pata_ep93xx.c:929]: (style) Checking if unsigned variable 'irq' is less than zero. Source code is irq = platform_get_irq(pdev, 0); if (irq < 0) { but unsigned int irq; $ fgrep platform_get_irq `find . -name \*.h -print` ./include/linux/platform_device.h:extern int platform_get_irq(struct platform_device *, unsigned int); Now using "int" type instead of "unsigned int" for "irq" variable. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80401 Reported-by: David Binderman Signed-off-by: Andrey Utkin Signed-off-by: Tejun Heo --- drivers/ata/pata_ep93xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ata/pata_ep93xx.c b/drivers/ata/pata_ep93xx.c index 6ad5c072ce34..4d37c5415fc7 100644 --- a/drivers/ata/pata_ep93xx.c +++ b/drivers/ata/pata_ep93xx.c @@ -915,7 +915,7 @@ static int ep93xx_pata_probe(struct platform_device *pdev) struct ep93xx_pata_data *drv_data; struct ata_host *host; struct ata_port *ap; - unsigned int irq; + int irq; struct resource *mem_res; void __iomem *ide_base; int err; From 144cb08864ed44be52d8634ac69cd98e5efcf527 Mon Sep 17 00:00:00 2001 From: Suravee Suthikulpanit Date: Tue, 15 Jul 2014 00:03:03 +0200 Subject: [PATCH 270/455] irqchip: gic: Add binding probe for ARM GIC400 Commit 3ab72f9156bb "dt-bindings: add GIC-400 binding" added the "arm,gic-400" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. Therefore add the missing irqchip declaration for it. Signed-off-by: Suravee Suthikulpanit Removed additional empty line and adapted commit message to mark it as fixing an issue. Signed-off-by: Heiko Stuebner Acked-by: Will Deacon Fixes: 3ab72f9156bb ("dt-bindings: add GIC-400 binding") Cc: # v3.14+ Link: https://lkml.kernel.org/r/2621565.f5eISveXXJ@diego Signed-off-by: Jason Cooper --- drivers/irqchip/irq-gic.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index aadd6f5723cb..66ed8922dab2 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1071,6 +1071,7 @@ gic_of_init(struct device_node *node, struct device_node *parent) gic_cnt++; return 0; } +IRQCHIP_DECLARE(gic_400, "arm,gic-400", gic_of_init); IRQCHIP_DECLARE(cortex_a15_gic, "arm,cortex-a15-gic", gic_of_init); IRQCHIP_DECLARE(cortex_a9_gic, "arm,cortex-a9-gic", gic_of_init); IRQCHIP_DECLARE(cortex_a7_gic, "arm,cortex-a7-gic", gic_of_init); From 0ac66effe7fcdee55bda6d5d10d3372c95a41920 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 14 Jul 2014 17:57:19 -0400 Subject: [PATCH 271/455] drm/radeon: avoid leaking edid data In some cases we fetch the edid in the detect() callback in order to determine what sort of monitor is connected. If that happens, don't fetch the edid again in the get_modes() callback or we will leak the edid. Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org --- drivers/gpu/drm/radeon/radeon_display.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 13896edcf0b6..65501af453be 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -830,6 +830,10 @@ int radeon_ddc_get_modes(struct radeon_connector *radeon_connector) struct radeon_device *rdev = dev->dev_private; int ret = 0; + /* don't leak the edid if we already fetched it in detect() */ + if (radeon_connector->edid) + goto got_edid; + /* on hw with routers, select right port */ if (radeon_connector->router.ddc_valid) radeon_router_select_ddc_port(radeon_connector); @@ -868,6 +872,7 @@ int radeon_ddc_get_modes(struct radeon_connector *radeon_connector) radeon_connector->edid = radeon_bios_get_hardcoded_edid(rdev); } if (radeon_connector->edid) { +got_edid: drm_mode_connector_update_edid_property(&radeon_connector->base, radeon_connector->edid); ret = drm_add_edid_modes(&radeon_connector->base, radeon_connector->edid); drm_edid_to_eld(&radeon_connector->base, radeon_connector->edid); From 201bb62402e0227375c655446ea04fcd0acf7287 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 15 Jul 2014 09:48:53 -0400 Subject: [PATCH 272/455] drm/radeon: set default bl level to something reasonable If the value in the scratch register is 0, set it to the max level. This fixes an issue where the console fb blanking code calls back into the backlight driver on unblank and then sets the backlight level to 0 after the driver has already set the mode and enabled the backlight. bugs: https://bugs.freedesktop.org/show_bug.cgi?id=81382 https://bugs.freedesktop.org/show_bug.cgi?id=70207 Signed-off-by: Alex Deucher Tested-by: David Heidelberger Cc: stable@vger.kernel.org --- drivers/gpu/drm/radeon/atombios_encoders.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c index 2b2908440644..7d68203a3737 100644 --- a/drivers/gpu/drm/radeon/atombios_encoders.c +++ b/drivers/gpu/drm/radeon/atombios_encoders.c @@ -183,7 +183,6 @@ void radeon_atom_backlight_init(struct radeon_encoder *radeon_encoder, struct backlight_properties props; struct radeon_backlight_privdata *pdata; struct radeon_encoder_atom_dig *dig; - u8 backlight_level; char bl_name[16]; /* Mac laptops with multiple GPUs use the gmux driver for backlight @@ -222,12 +221,17 @@ void radeon_atom_backlight_init(struct radeon_encoder *radeon_encoder, pdata->encoder = radeon_encoder; - backlight_level = radeon_atom_get_backlight_level_from_reg(rdev); - dig = radeon_encoder->enc_priv; dig->bl_dev = bd; bd->props.brightness = radeon_atom_backlight_get_brightness(bd); + /* Set a reasonable default here if the level is 0 otherwise + * fbdev will attempt to turn the backlight on after console + * unblanking and it will try and restore 0 which turns the backlight + * off again. + */ + if (bd->props.brightness == 0) + bd->props.brightness = RADEON_MAX_BL_LEVEL; bd->props.power = FB_BLANK_UNBLANK; backlight_update_status(bd); From f53f81b2576a9bd3af947e2b1c3a46dfab51c5ef Mon Sep 17 00:00:00 2001 From: Mario Kleiner Date: Thu, 3 Jul 2014 03:45:02 +0200 Subject: [PATCH 273/455] drm/radeon: Prevent too early kms-pageflips triggered by vblank. Since 3.16-rc1 we have this new failure: When the userspace XOrg ddx schedules vblank events to trigger deferred kms-pageflips, e.g., via the OML_sync_control extension call glXSwapBuffersMscOML(), or if a glXSwapBuffers() is called immediately after completion of a previous swapbuffers call, e.g., in a tight rendering loop with minimal rendering, it happens frequently that the pageflip ioctl() is executed within the same vblank in which a previous kms-pageflip completed, or - for deferred swaps - always one vblank earlier than requested by the client app. This causes premature pageflips and detection of failure by the ddx, e.g., XOrg log warnings like... "(WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 201025 < target_msc 201026" ... and error/invalid return values of glXWaitForSbcOML() and Intel_swap_events extension. Reason is the new way in which kms-pageflips are programmed since 3.16. This commit changes the time window in which the hw can execute pending programmed pageflips. Before, a pending flip would get executed anywhere within the vblank interval. Now a pending flip only gets executed at the leading edge of vblank (start of front porch), making sure that a invocation of the pageflip ioctl() within a given vblank interval will only lead to pageflip completion in the following vblank. Tested to death on a DCE-4 card. Signed-off-by: Mario Kleiner Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/atombios_crtc.c | 8 ++++---- drivers/gpu/drm/radeon/evergreen.c | 5 +++-- drivers/gpu/drm/radeon/evergreen_reg.h | 1 - drivers/gpu/drm/radeon/rv515.c | 5 +++-- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c index a03c73411a56..30d242b25078 100644 --- a/drivers/gpu/drm/radeon/atombios_crtc.c +++ b/drivers/gpu/drm/radeon/atombios_crtc.c @@ -1414,8 +1414,8 @@ static int dce4_crtc_do_set_base(struct drm_crtc *crtc, tmp &= ~EVERGREEN_GRPH_SURFACE_UPDATE_H_RETRACE_EN; WREG32(EVERGREEN_GRPH_FLIP_CONTROL + radeon_crtc->crtc_offset, tmp); - /* set pageflip to happen anywhere in vblank interval */ - WREG32(EVERGREEN_MASTER_UPDATE_MODE + radeon_crtc->crtc_offset, 0); + /* set pageflip to happen only at start of vblank interval (front porch) */ + WREG32(EVERGREEN_MASTER_UPDATE_MODE + radeon_crtc->crtc_offset, 3); if (!atomic && fb && fb != crtc->primary->fb) { radeon_fb = to_radeon_framebuffer(fb); @@ -1614,8 +1614,8 @@ static int avivo_crtc_do_set_base(struct drm_crtc *crtc, tmp &= ~AVIVO_D1GRPH_SURFACE_UPDATE_H_RETRACE_EN; WREG32(AVIVO_D1GRPH_FLIP_CONTROL + radeon_crtc->crtc_offset, tmp); - /* set pageflip to happen anywhere in vblank interval */ - WREG32(AVIVO_D1MODE_MASTER_UPDATE_MODE + radeon_crtc->crtc_offset, 0); + /* set pageflip to happen only at start of vblank interval (front porch) */ + WREG32(AVIVO_D1MODE_MASTER_UPDATE_MODE + radeon_crtc->crtc_offset, 3); if (!atomic && fb && fb != crtc->primary->fb) { radeon_fb = to_radeon_framebuffer(fb); diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f7ece0ff431b..250bac3935a4 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -2642,8 +2642,9 @@ void evergreen_mc_resume(struct radeon_device *rdev, struct evergreen_mc_save *s for (i = 0; i < rdev->num_crtc; i++) { if (save->crtc_enabled[i]) { tmp = RREG32(EVERGREEN_MASTER_UPDATE_MODE + crtc_offsets[i]); - if ((tmp & 0x3) != 0) { - tmp &= ~0x3; + if ((tmp & 0x7) != 3) { + tmp &= ~0x7; + tmp |= 0x3; WREG32(EVERGREEN_MASTER_UPDATE_MODE + crtc_offsets[i], tmp); } tmp = RREG32(EVERGREEN_GRPH_UPDATE + crtc_offsets[i]); diff --git a/drivers/gpu/drm/radeon/evergreen_reg.h b/drivers/gpu/drm/radeon/evergreen_reg.h index 333d143fca2c..23bff590fb6e 100644 --- a/drivers/gpu/drm/radeon/evergreen_reg.h +++ b/drivers/gpu/drm/radeon/evergreen_reg.h @@ -239,7 +239,6 @@ # define EVERGREEN_CRTC_V_BLANK (1 << 0) #define EVERGREEN_CRTC_STATUS_POSITION 0x6e90 #define EVERGREEN_CRTC_STATUS_HV_COUNT 0x6ea0 -#define EVERGREEN_MASTER_UPDATE_MODE 0x6ef8 #define EVERGREEN_CRTC_UPDATE_LOCK 0x6ed4 #define EVERGREEN_MASTER_UPDATE_LOCK 0x6ef4 #define EVERGREEN_MASTER_UPDATE_MODE 0x6ef8 diff --git a/drivers/gpu/drm/radeon/rv515.c b/drivers/gpu/drm/radeon/rv515.c index 237dd29d9f1c..3e21e869015f 100644 --- a/drivers/gpu/drm/radeon/rv515.c +++ b/drivers/gpu/drm/radeon/rv515.c @@ -406,8 +406,9 @@ void rv515_mc_resume(struct radeon_device *rdev, struct rv515_mc_save *save) for (i = 0; i < rdev->num_crtc; i++) { if (save->crtc_enabled[i]) { tmp = RREG32(AVIVO_D1MODE_MASTER_UPDATE_MODE + crtc_offsets[i]); - if ((tmp & 0x3) != 0) { - tmp &= ~0x3; + if ((tmp & 0x7) != 3) { + tmp &= ~0x7; + tmp |= 0x3; WREG32(AVIVO_D1MODE_MASTER_UPDATE_MODE + crtc_offsets[i], tmp); } tmp = RREG32(AVIVO_D1GRPH_UPDATE + crtc_offsets[i]); From c60381bd82a54233bb46f93be00a4154bd0cf95d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Mon, 14 Jul 2014 15:48:42 +0900 Subject: [PATCH 274/455] drm/radeon: Move pinning the BO back to radeon_crtc_page_flip() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As well as enabling the vblank interrupt. These shouldn't take any significant amount of time, but at least pinning the BO has actually been seen to fail in practice before, in which case we need to let userspace know about it. Reviewed-by: Christian König Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon.h | 3 +- drivers/gpu/drm/radeon/radeon_display.c | 189 ++++++++++++------------ 2 files changed, 97 insertions(+), 95 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 29d9cc04c04e..b7204500a9a6 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -684,10 +684,9 @@ struct radeon_flip_work { struct work_struct unpin_work; struct radeon_device *rdev; int crtc_id; - struct drm_framebuffer *fb; + uint64_t base; struct drm_pending_vblank_event *event; struct radeon_bo *old_rbo; - struct radeon_bo *new_rbo; struct radeon_fence *fence; }; diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 65501af453be..8de579473645 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -386,11 +386,6 @@ static void radeon_flip_work_func(struct work_struct *__work) struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[work->crtc_id]; struct drm_crtc *crtc = &radeon_crtc->base; - struct drm_framebuffer *fb = work->fb; - - uint32_t tiling_flags, pitch_pixels; - uint64_t base; - unsigned long flags; int r; @@ -411,26 +406,94 @@ static void radeon_flip_work_func(struct work_struct *__work) radeon_fence_unref(&work->fence); } - /* pin the new buffer */ - DRM_DEBUG_DRIVER("flip-ioctl() cur_fbo = %p, cur_bbo = %p\n", - work->old_rbo, work->new_rbo); + /* do the flip (mmio) */ + radeon_page_flip(rdev, radeon_crtc->crtc_id, work->base); - r = radeon_bo_reserve(work->new_rbo, false); + /* We borrow the event spin lock for protecting flip_status */ + spin_lock_irqsave(&crtc->dev->event_lock, flags); + + /* set the proper interrupt */ + radeon_irq_kms_pflip_irq_get(rdev, radeon_crtc->crtc_id); + + radeon_crtc->flip_status = RADEON_FLIP_SUBMITTED; + spin_unlock_irqrestore(&crtc->dev->event_lock, flags); + up_read(&rdev->exclusive_lock); + + return; + +cleanup: + drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); + radeon_fence_unref(&work->fence); + kfree(work); + up_read(&rdev->exclusive_lock); +} + +static int radeon_crtc_page_flip(struct drm_crtc *crtc, + struct drm_framebuffer *fb, + struct drm_pending_vblank_event *event, + uint32_t page_flip_flags) +{ + struct drm_device *dev = crtc->dev; + struct radeon_device *rdev = dev->dev_private; + struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc); + struct radeon_framebuffer *old_radeon_fb; + struct radeon_framebuffer *new_radeon_fb; + struct drm_gem_object *obj; + struct radeon_flip_work *work; + struct radeon_bo *new_rbo; + uint32_t tiling_flags, pitch_pixels; + uint64_t base; + unsigned long flags; + int r; + + work = kzalloc(sizeof *work, GFP_KERNEL); + if (work == NULL) + return -ENOMEM; + + INIT_WORK(&work->flip_work, radeon_flip_work_func); + INIT_WORK(&work->unpin_work, radeon_unpin_work_func); + + work->rdev = rdev; + work->crtc_id = radeon_crtc->crtc_id; + work->event = event; + + /* schedule unpin of the old buffer */ + old_radeon_fb = to_radeon_framebuffer(crtc->primary->fb); + obj = old_radeon_fb->obj; + + /* take a reference to the old object */ + drm_gem_object_reference(obj); + work->old_rbo = gem_to_radeon_bo(obj); + + new_radeon_fb = to_radeon_framebuffer(fb); + obj = new_radeon_fb->obj; + new_rbo = gem_to_radeon_bo(obj); + + spin_lock(&new_rbo->tbo.bdev->fence_lock); + if (new_rbo->tbo.sync_obj) + work->fence = radeon_fence_ref(new_rbo->tbo.sync_obj); + spin_unlock(&new_rbo->tbo.bdev->fence_lock); + + /* pin the new buffer */ + DRM_DEBUG_DRIVER("flip-ioctl() cur_rbo = %p, new_rbo = %p\n", + work->old_rbo, new_rbo); + + r = radeon_bo_reserve(new_rbo, false); if (unlikely(r != 0)) { DRM_ERROR("failed to reserve new rbo buffer before flip\n"); goto cleanup; } /* Only 27 bit offset for legacy CRTC */ - r = radeon_bo_pin_restricted(work->new_rbo, RADEON_GEM_DOMAIN_VRAM, + r = radeon_bo_pin_restricted(new_rbo, RADEON_GEM_DOMAIN_VRAM, ASIC_IS_AVIVO(rdev) ? 0 : 1 << 27, &base); if (unlikely(r != 0)) { - radeon_bo_unreserve(work->new_rbo); + radeon_bo_unreserve(new_rbo); r = -EINVAL; DRM_ERROR("failed to pin new rbo buffer before flip\n"); goto cleanup; } - radeon_bo_get_tiling_flags(work->new_rbo, &tiling_flags, NULL); - radeon_bo_unreserve(work->new_rbo); + radeon_bo_get_tiling_flags(new_rbo, &tiling_flags, NULL); + radeon_bo_unreserve(new_rbo); if (!ASIC_IS_AVIVO(rdev)) { /* crtc offset is from display base addr not FB location */ @@ -467,6 +530,7 @@ static void radeon_flip_work_func(struct work_struct *__work) } base &= ~7; } + work->base = base; r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id); if (r) { @@ -477,88 +541,11 @@ static void radeon_flip_work_func(struct work_struct *__work) /* We borrow the event spin lock for protecting flip_work */ spin_lock_irqsave(&crtc->dev->event_lock, flags); - /* set the proper interrupt */ - radeon_irq_kms_pflip_irq_get(rdev, radeon_crtc->crtc_id); - - /* do the flip (mmio) */ - radeon_page_flip(rdev, radeon_crtc->crtc_id, base); - - radeon_crtc->flip_status = RADEON_FLIP_SUBMITTED; - spin_unlock_irqrestore(&crtc->dev->event_lock, flags); - up_read(&rdev->exclusive_lock); - - return; - -pflip_cleanup: - if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) { - DRM_ERROR("failed to reserve new rbo in error path\n"); - goto cleanup; - } - if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) { - DRM_ERROR("failed to unpin new rbo in error path\n"); - } - radeon_bo_unreserve(work->new_rbo); - -cleanup: - drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); - radeon_fence_unref(&work->fence); - kfree(work); - up_read(&rdev->exclusive_lock); -} - -static int radeon_crtc_page_flip(struct drm_crtc *crtc, - struct drm_framebuffer *fb, - struct drm_pending_vblank_event *event, - uint32_t page_flip_flags) -{ - struct drm_device *dev = crtc->dev; - struct radeon_device *rdev = dev->dev_private; - struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc); - struct radeon_framebuffer *old_radeon_fb; - struct radeon_framebuffer *new_radeon_fb; - struct drm_gem_object *obj; - struct radeon_flip_work *work; - unsigned long flags; - - work = kzalloc(sizeof *work, GFP_KERNEL); - if (work == NULL) - return -ENOMEM; - - INIT_WORK(&work->flip_work, radeon_flip_work_func); - INIT_WORK(&work->unpin_work, radeon_unpin_work_func); - - work->rdev = rdev; - work->crtc_id = radeon_crtc->crtc_id; - work->fb = fb; - work->event = event; - - /* schedule unpin of the old buffer */ - old_radeon_fb = to_radeon_framebuffer(crtc->primary->fb); - obj = old_radeon_fb->obj; - - /* take a reference to the old object */ - drm_gem_object_reference(obj); - work->old_rbo = gem_to_radeon_bo(obj); - - new_radeon_fb = to_radeon_framebuffer(fb); - obj = new_radeon_fb->obj; - work->new_rbo = gem_to_radeon_bo(obj); - - spin_lock(&work->new_rbo->tbo.bdev->fence_lock); - if (work->new_rbo->tbo.sync_obj) - work->fence = radeon_fence_ref(work->new_rbo->tbo.sync_obj); - spin_unlock(&work->new_rbo->tbo.bdev->fence_lock); - - /* We borrow the event spin lock for protecting flip_work */ - spin_lock_irqsave(&crtc->dev->event_lock, flags); - if (radeon_crtc->flip_status != RADEON_FLIP_NONE) { DRM_DEBUG_DRIVER("flip queue: crtc already busy\n"); spin_unlock_irqrestore(&crtc->dev->event_lock, flags); - drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); - radeon_fence_unref(&work->fence); - kfree(work); - return -EBUSY; + r = -EBUSY; + goto pflip_cleanup; } radeon_crtc->flip_status = RADEON_FLIP_PENDING; radeon_crtc->flip_work = work; @@ -569,8 +556,24 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc, spin_unlock_irqrestore(&crtc->dev->event_lock, flags); queue_work(radeon_crtc->flip_queue, &work->flip_work); - return 0; + +pflip_cleanup: + if (unlikely(radeon_bo_reserve(new_rbo, false) != 0)) { + DRM_ERROR("failed to reserve new rbo in error path\n"); + goto cleanup; + } + if (unlikely(radeon_bo_unpin(new_rbo) != 0)) { + DRM_ERROR("failed to unpin new rbo in error path\n"); + } + radeon_bo_unreserve(new_rbo); + +cleanup: + drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); + radeon_fence_unref(&work->fence); + kfree(work); + + return r; } static int From 306f98d9a1079cc78567b1a6a5e94c272c163e47 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Mon, 14 Jul 2014 15:58:03 +0900 Subject: [PATCH 275/455] drm/radeon: Complete page flip even if waiting on the BO fence fails MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Otherwise the DRM core and userspace will be confused about which BO the CRTC is scanning out. Reviewed-by: Christian König Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_display.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 8de579473645..97ea46597bea 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -390,20 +390,22 @@ static void radeon_flip_work_func(struct work_struct *__work) int r; down_read(&rdev->exclusive_lock); - while (work->fence) { + if (work->fence) { r = radeon_fence_wait(work->fence, false); if (r == -EDEADLK) { up_read(&rdev->exclusive_lock); r = radeon_gpu_reset(rdev); down_read(&rdev->exclusive_lock); } + if (r) + DRM_ERROR("failed to wait on page flip fence (%d)!\n", r); - if (r) { - DRM_ERROR("failed to wait on page flip fence (%d)!\n", - r); - goto cleanup; - } else - radeon_fence_unref(&work->fence); + /* We continue with the page flip even if we failed to wait on + * the fence, otherwise the DRM core and userspace will be + * confused about which BO the CRTC is scanning out + */ + + radeon_fence_unref(&work->fence); } /* do the flip (mmio) */ @@ -418,14 +420,6 @@ static void radeon_flip_work_func(struct work_struct *__work) radeon_crtc->flip_status = RADEON_FLIP_SUBMITTED; spin_unlock_irqrestore(&crtc->dev->event_lock, flags); up_read(&rdev->exclusive_lock); - - return; - -cleanup: - drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); - radeon_fence_unref(&work->fence); - kfree(work); - up_read(&rdev->exclusive_lock); } static int radeon_crtc_page_flip(struct drm_crtc *crtc, From c89e5be621e84b7b83650d87a956c52c5654c35b Mon Sep 17 00:00:00 2001 From: Mario Kleiner Date: Thu, 17 Jul 2014 01:27:25 +0200 Subject: [PATCH 276/455] drm/radeon: Remove redundant fence unref in pageflip path. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Not needed anymore, as it is already unreffed within radeon_flip_work_func() after its only use. Signed-off-by: Mario Kleiner Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_display.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 97ea46597bea..bfa3a6cc170f 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -366,7 +366,6 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) spin_unlock_irqrestore(&rdev->ddev->event_lock, flags); drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); - radeon_fence_unref(&work->fence); radeon_irq_kms_pflip_irq_put(rdev, work->crtc_id); queue_work(radeon_crtc->flip_queue, &work->unpin_work); } From 826484977c29b42c8cb8c42bd41acaa6e152a4bb Mon Sep 17 00:00:00 2001 From: Mario Kleiner Date: Thu, 17 Jul 2014 01:37:53 +0200 Subject: [PATCH 277/455] drm/radeon: Add missing vblank_put in pageflip ioctl error path. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Mario Kleiner Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_display.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index bfa3a6cc170f..0a22a1bb688c 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -538,7 +538,7 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc, DRM_DEBUG_DRIVER("flip queue: crtc already busy\n"); spin_unlock_irqrestore(&crtc->dev->event_lock, flags); r = -EBUSY; - goto pflip_cleanup; + goto vblank_cleanup; } radeon_crtc->flip_status = RADEON_FLIP_PENDING; radeon_crtc->flip_work = work; @@ -551,6 +551,9 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc, queue_work(radeon_crtc->flip_queue, &work->flip_work); return 0; +vblank_cleanup: + drm_vblank_put(crtc->dev, radeon_crtc->crtc_id); + pflip_cleanup: if (unlikely(radeon_bo_reserve(new_rbo, false) != 0)) { DRM_ERROR("failed to reserve new rbo in error path\n"); From 5f87e090a7368adc2290ae17ffd82a070caadd20 Mon Sep 17 00:00:00 2001 From: Mario Kleiner Date: Thu, 17 Jul 2014 02:24:45 +0200 Subject: [PATCH 278/455] drm/radeon: Make classic pageflip completion path less racy. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Need to protect mmio flip programming by event lock as well. Need to also first enable pflip irq, then mmio program, otherwise a flip completion may get unnoticed in the vblank of actual completion if the flip is programmed, but radeon_flip_work_func gets preempted immediately after mmio programming and before vblank. In that case the vblank irq handler wouldn't run radeon_crtc_handle_vblank() with the completion check routine, miss the completed flip, and only notice one vblank after actual completion, causing a false/delayed report of flip completion. Signed-off-by: Mario Kleiner Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_display.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 0a22a1bb688c..bf25061c8ac4 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -407,15 +407,15 @@ static void radeon_flip_work_func(struct work_struct *__work) radeon_fence_unref(&work->fence); } - /* do the flip (mmio) */ - radeon_page_flip(rdev, radeon_crtc->crtc_id, work->base); - /* We borrow the event spin lock for protecting flip_status */ spin_lock_irqsave(&crtc->dev->event_lock, flags); /* set the proper interrupt */ radeon_irq_kms_pflip_irq_get(rdev, radeon_crtc->crtc_id); + /* do the flip (mmio) */ + radeon_page_flip(rdev, radeon_crtc->crtc_id, work->base); + radeon_crtc->flip_status = RADEON_FLIP_SUBMITTED; spin_unlock_irqrestore(&crtc->dev->event_lock, flags); up_read(&rdev->exclusive_lock); From 6b076991dca9817e75c37e2f0db6d52611ea42fa Mon Sep 17 00:00:00 2001 From: Russell King Date: Thu, 17 Jul 2014 12:17:45 +0100 Subject: [PATCH 279/455] ARM: DMA: ensure that old section mappings are flushed from the TLB When setting up the CMA region, we must ensure that the old section mappings are flushed from the TLB before replacing them with page tables, otherwise we can suffer from mismatched aliases if the CPU speculatively prefetches from these mappings at an inopportune time. A mismatched alias can occur when the TLB contains a section mapping, but a subsequent prefetch causes it to load a page table mapping, resulting in the possibility of the TLB containing two matching mappings for the same virtual address region. Acked-by: Will Deacon Signed-off-by: Russell King --- arch/arm/mm/dma-mapping.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 4c88935654ca..1f88db06b133 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -461,12 +461,21 @@ void __init dma_contiguous_remap(void) map.type = MT_MEMORY_DMA_READY; /* - * Clear previous low-memory mapping + * Clear previous low-memory mapping to ensure that the + * TLB does not see any conflicting entries, then flush + * the TLB of the old entries before creating new mappings. + * + * This ensures that any speculatively loaded TLB entries + * (even though they may be rare) can not cause any problems, + * and ensures that this code is architecturally compliant. */ for (addr = __phys_to_virt(start); addr < __phys_to_virt(end); addr += PMD_SIZE) pmd_clear(pmd_off_k(addr)); + flush_tlb_kernel_range(__phys_to_virt(start), + __phys_to_virt(end)); + iotable_init(&map, 1); } } From 89fb4cd1f717a871ef79fa7debbe840e3225cd54 Mon Sep 17 00:00:00 2001 From: James Bottomley Date: Thu, 3 Jul 2014 19:17:34 +0200 Subject: [PATCH 280/455] scsi: handle flush errors properly Flush commands don't transfer data and thus need to be special cased in the I/O completion handler so that we can propagate errors to the block layer and filesystem. Signed-off-by: James Bottomley Reported-by: Steven Haber Tested-by: Steven Haber Reviewed-by: Martin K. Petersen Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig --- drivers/scsi/scsi_lib.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f7e316368c99..3f50dfcb3227 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -733,6 +733,14 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) scsi_next_command(cmd); return; } + } else if (blk_rq_bytes(req) == 0 && result && !sense_deferred) { + /* + * Certain non BLOCK_PC requests are commands that don't + * actually transfer anything (FLUSH), so cannot use + * good_bytes != blk_rq_bytes(req) as the signal for an error. + * This sets the error explicitly for the problem case. + */ + error = __scsi_error_from_host_byte(cmd, result); } /* no bidi support for !REQ_TYPE_BLOCK_PC yet */ From a28d0e873d2899bd750ae495f84fe9c1a2f53809 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 17 Jul 2014 13:50:45 +0300 Subject: [PATCH 281/455] wan/x25_asy: integer overflow in x25_asy_change_mtu() If "newmtu * 2 + 4" is too large then it can cause an integer overflow leading to memory corruption. Eric Dumazet suggests that 65534 is a reasonable upper limit. Btw, "newmtu" is not allowed to be a negative number because of the check in dev_set_mtu(), so that's ok. Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller --- drivers/net/wan/x25_asy.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c index 5895f1978691..fa9fdfa128c1 100644 --- a/drivers/net/wan/x25_asy.c +++ b/drivers/net/wan/x25_asy.c @@ -122,8 +122,12 @@ static int x25_asy_change_mtu(struct net_device *dev, int newmtu) { struct x25_asy *sl = netdev_priv(dev); unsigned char *xbuff, *rbuff; - int len = 2 * newmtu; + int len; + if (newmtu > 65534) + return -EINVAL; + + len = 2 * newmtu; xbuff = kmalloc(len + 4, GFP_ATOMIC); rbuff = kmalloc(len + 4, GFP_ATOMIC); From 5343330010a892b76a97fd93ad3c455a4a32a7fb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Thu, 17 Jul 2014 13:33:51 +0200 Subject: [PATCH 282/455] net: qmi_wwan: add two Sierra Wireless/Netgear devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add two device IDs found in an out-of-tree driver downloadable from Netgear. Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller --- drivers/net/usb/qmi_wwan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index c4638c67f6b9..22756db53dca 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -667,6 +667,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x05c6, 0x9084, 4)}, {QMI_FIXED_INTF(0x05c6, 0x920d, 0)}, {QMI_FIXED_INTF(0x05c6, 0x920d, 5)}, + {QMI_FIXED_INTF(0x0846, 0x68a2, 8)}, {QMI_FIXED_INTF(0x12d1, 0x140c, 1)}, /* Huawei E173 */ {QMI_FIXED_INTF(0x12d1, 0x14ac, 1)}, /* Huawei E1820 */ {QMI_FIXED_INTF(0x16d8, 0x6003, 0)}, /* CMOTech 6003 */ @@ -757,6 +758,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x9054, 8)}, /* Sierra Wireless Modem */ {QMI_FIXED_INTF(0x1199, 0x9055, 8)}, /* Netgear AirCard 341U */ {QMI_FIXED_INTF(0x1199, 0x9056, 8)}, /* Sierra Wireless Modem */ + {QMI_FIXED_INTF(0x1199, 0x9057, 8)}, {QMI_FIXED_INTF(0x1199, 0x9061, 8)}, /* Sierra Wireless Modem */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */ From c2a6c7813f1ffae636e369b5d7011c9f518d3cd9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Thu, 17 Jul 2014 13:34:09 +0200 Subject: [PATCH 283/455] net: huawei_cdc_ncm: add "subclass 3" devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Huawei's usage of the subclass and protocol fields is not 100% clear to us, but there appears to be a very strict system. A device with the "shared" device ID 12d1:1506 and this NCM function was recently reported (showing only default altsetting): Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 3 bInterfaceProtocol 22 iInterface 8 CDC Network Control Model (NCM) ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 06 24 1a 00 01 1f ** UNRECOGNIZED: 0c 24 1b 00 01 00 04 10 14 dc 05 20 ** UNRECOGNIZED: 0d 24 0f 0a 0f 00 00 00 ea 05 03 00 01 ** UNRECOGNIZED: 05 24 06 01 01 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0010 1x 16 bytes bInterval 9 Cc: Enrico Mioso Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller --- drivers/net/usb/huawei_cdc_ncm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/usb/huawei_cdc_ncm.c b/drivers/net/usb/huawei_cdc_ncm.c index 5d95a13dbe2a..735f7dadb9a0 100644 --- a/drivers/net/usb/huawei_cdc_ncm.c +++ b/drivers/net/usb/huawei_cdc_ncm.c @@ -194,6 +194,9 @@ static const struct usb_device_id huawei_cdc_ncm_devs[] = { { USB_VENDOR_AND_INTERFACE_INFO(0x12d1, 0xff, 0x02, 0x76), .driver_info = (unsigned long)&huawei_cdc_ncm_info, }, + { USB_VENDOR_AND_INTERFACE_INFO(0x12d1, 0xff, 0x03, 0x16), + .driver_info = (unsigned long)&huawei_cdc_ncm_info, + }, /* Terminating entry */ { From 953c66469735aed8d2ada639a72b150f01dae605 Mon Sep 17 00:00:00 2001 From: Abbas Raza Date: Thu, 17 Jul 2014 19:34:31 +0800 Subject: [PATCH 284/455] usb: chipidea: udc: Disable auto ZLP generation on ep0 There are 2 methods for ZLP (zero-length packet) generation: 1) In software 2) Automatic generation by device controller 1) is implemented in UDC driver and it attaches ZLP to IN packet if descriptor->size < wLength 2) can be enabled/disabled by setting ZLT bit in the QH When gadget ffs is connected to ubuntu host, the host sends get descriptor request and wLength in setup packet is 255 while the size of descriptor which will be sent by gadget in IN packet is 64 byte. So the composite driver sets req->zero = 1. In UDC driver following code will be executed then if (hwreq->req.zero && hwreq->req.length && (hwreq->req.length % hwep->ep.maxpacket == 0)) add_td_to_list(hwep, hwreq, 0); Case-A: So in case of ubuntu host, UDC driver will attach a ZLP to the IN packet. ubuntu host will request 255 byte in IN request, gadget will send 64 byte with ZLP and host will come to know that there is no more data. But hold on, by default ZLT=0 for endpoint 0 so hardware also tries to automatically generate the ZLP which blocks enumeration for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING) Case-B: In case when gadget ffs is connected to Apple device, Apple device sends setup packet with wLength=64. So descriptor->size = 64 and wLength=64 therefore req->zero = 0 and UDC driver will not attach any ZLP to the IN packet. Apple device requests 64 bytes, gets 64 bytes and doesn't further request for IN data. But ZLT=0 by default for endpoint 0 so hardware tries to automatically generate the ZLP which blocks enumeration for ~6 seconds due to endpoint 0 STALL, NAKs are sent to host for any requests (OUT/PING) According to USB2.0 specs: 8.5.3.2 Variable-length Data Stage A control pipe may have a variable-length data phase in which the host requests more data than is contained in the specified data structure. When all of the data structure is returned to the host, the function should indicate that the Data stage is ended by returning a packet that is shorter than the MaxPacketSize for the pipe. If the data structure is an exact multiple of wMaxPacketSize for the pipe, the function will return a zero-length packet to indicate the end of the Data stage. In Case-A mentioned above: If we disable software ZLP generation & ZLT=0 for endpoint 0 OR if software ZLP generation is not disabled but we set ZLT=1 for endpoint 0 then enumeration doesn't block for 6 seconds. In Case-B mentioned above: If we disable software ZLP generation & ZLT=0 for endpoint then enumeration still blocks due to ZLP automatically generated by hardware and host not needing it. But if we keep software ZLP generation enabled but we set ZLT=1 for endpoint 0 then enumeration doesn't block for 6 seconds. So the proper solution for this issue seems to disable automatic ZLP generation by hardware (i.e by setting ZLT=1 for endpoint 0) and let software (UDC driver) handle the ZLP generation based on req->zero field. Cc: stable@vger.kernel.org Signed-off-by: Abbas Raza Signed-off-by: Peter Chen Signed-off-by: Greg Kroah-Hartman --- drivers/usb/chipidea/udc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c index 9d2b673f90e3..b8125aa64ad8 100644 --- a/drivers/usb/chipidea/udc.c +++ b/drivers/usb/chipidea/udc.c @@ -1169,8 +1169,8 @@ static int ep_enable(struct usb_ep *ep, if (hwep->type == USB_ENDPOINT_XFER_CONTROL) cap |= QH_IOS; - if (hwep->num) - cap |= QH_ZLT; + + cap |= QH_ZLT; cap |= (hwep->ep.maxpacket << __ffs(QH_MAX_PKT)) & QH_MAX_PKT; /* * For ISO-TX, we set mult at QH as the largest value, and use From bb86cf569bbd7ad4dce581a37c7fbd748057e9dc Mon Sep 17 00:00:00 2001 From: Gavin Guo Date: Fri, 18 Jul 2014 01:12:13 +0800 Subject: [PATCH 285/455] usb: Check if port status is equal to RxDetect When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller [1022:7814], the second hotplugging will experience the USB 3.0 pen drive is recognized as high-speed device. After bisecting the kernel, I found the commit number 41e7e056cdc662f704fa9262e5c6e213b4ab45dd (USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing some experiments, the bug can be fixed by avoiding executing the function hub_usb3_port_disable(). Because the port status with [AMD] FCH USB XHCI Controlleris [1022:7814] is already in RxDetect (I tried printing out the port status before setting to Disabled state), it's reasonable to check the port status before really executing hub_usb3_port_disable(). Fixes: 41e7e056cdc6 (USB: Allow USB 3.0 ports to be disabled.) Signed-off-by: Gavin Guo Acked-by: Alan Stern Cc: Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 21b99b4b4082..0e950ad8cb25 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -889,6 +889,25 @@ static int hub_usb3_port_disable(struct usb_hub *hub, int port1) if (!hub_is_superspeed(hub->hdev)) return -EINVAL; + ret = hub_port_status(hub, port1, &portstatus, &portchange); + if (ret < 0) + return ret; + + /* + * USB controller Advanced Micro Devices, Inc. [AMD] FCH USB XHCI + * Controller [1022:7814] will have spurious result making the following + * usb 3.0 device hotplugging route to the 2.0 root hub and recognized + * as high-speed device if we set the usb 3.0 port link state to + * Disabled. Since it's already in USB_SS_PORT_LS_RX_DETECT state, we + * check the state here to avoid the bug. + */ + if ((portstatus & USB_PORT_STAT_LINK_STATE) == + USB_SS_PORT_LS_RX_DETECT) { + dev_dbg(&hub->ports[port1 - 1]->dev, + "Not disabling port; link state is RxDetect\n"); + return ret; + } + ret = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_SS_DISABLED); if (ret) return ret; From 2b1987a9f1e6250c962ca13820d3c69817879266 Mon Sep 17 00:00:00 2001 From: Brian W Hart Date: Fri, 27 Jun 2014 16:09:39 -0500 Subject: [PATCH 286/455] cpufreq: make table sentinel macros unsigned to match use MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*) moved function cpufreq_next_valid() to a public header. Warnings are now generated when objects including that header are built with -Wsign-compare (as an out-of-tree module might be): .../include/linux/cpufreq.h: In function ‘cpufreq_next_valid’: .../include/linux/cpufreq.h:519:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] while ((*pos)->frequency != CPUFREQ_TABLE_END) ^ .../include/linux/cpufreq.h:520:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID) ^ Constants CPUFREQ_ENTRY_INVALID and CPUFREQ_TABLE_END are signed, but are used with unsigned member 'frequency' of cpufreq_frequency_table. Update the macro definitions to be explicitly unsigned to match their use. This also corrects potentially wrong behavior of clk_rate_table_iter() if unsigned long is wider than usigned int. Fixes: 5eeaf1f18973 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*) Signed-off-by: Brian W Hart Reviewed-by: Simon Horman Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- include/linux/cpufreq.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index ec4112d257bc..8f8ae95c6e27 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -482,8 +482,8 @@ extern struct cpufreq_governor cpufreq_gov_conservative; *********************************************************************/ /* Special Values of .frequency field */ -#define CPUFREQ_ENTRY_INVALID ~0 -#define CPUFREQ_TABLE_END ~1 +#define CPUFREQ_ENTRY_INVALID ~0u +#define CPUFREQ_TABLE_END ~1u /* Special Values of .flags field */ #define CPUFREQ_BOOST_FREQ (1 << 0) From 2ef82d24f445e82f80e235f44eb9d1bc933e3670 Mon Sep 17 00:00:00 2001 From: Dexuan Cui Date: Wed, 16 Jul 2014 00:00:45 -0700 Subject: [PATCH 287/455] Drivers: hv: hv_fcopy: fix a race condition for SMP guest We should schedule the 5s "timer work" before starting the data transfer, otherwise, the data transfer code may finish so fast on another virtual cpu that when the code(fcopy_write()) trying to cancel the 5s "timer work" can occasionally fail because the "timer work" may haven't been scheduled yet and as a result the fcopy process will be aborted wrongly by fcopy_work_func() in 5s. Thank Liz Zhang for the initial investigation on the bug. This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1118123 Tested-by: Liz Zhang Cc: Haiyang Zhang Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui Signed-off-by: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman --- drivers/hv/hv_fcopy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index eaaa3d843b80..23b2ce294c4c 100644 --- a/drivers/hv/hv_fcopy.c +++ b/drivers/hv/hv_fcopy.c @@ -246,8 +246,8 @@ void hv_fcopy_onchannelcallback(void *context) /* * Send the information to the user-level daemon. */ - fcopy_send_data(); schedule_delayed_work(&fcopy_work, 5*HZ); + fcopy_send_data(); return; } icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE; From d81b4253b0f0f1e7b7e03b0cd0f80cab18bc4d7b Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Thu, 17 Jul 2014 11:44:11 +0000 Subject: [PATCH 288/455] kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64 On ia64 and ppc64, function pointers do not point to the entry address of the function, but to the address of a function descriptor (which contains the entry address and misc data). Since the kprobes code passes the function pointer stored by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for initalizing its blacklist, it fails and reports many errors, such as: Failed to find blacklist 0001013168300000 Failed to find blacklist 0001013000f0a000 [...] To fix this bug, use arch_deref_entry_point() to get the function entry address for kallsyms_lookup_size_offset() instead of the raw function pointer. Suzuki also pointed out that blacklist entries should also be updated as well. Reported-by: Tony Luck Fixed-by: Suzuki K. Poulose Tested-by: Tony Luck Tested-by: Michael Ellerman Signed-off-by: Masami Hiramatsu Acked-by: Michael Ellerman (for powerpc) Acked-by: Benjamin Herrenschmidt Cc: Jeremy Fitzhardinge Cc: sparse@chrisli.org Cc: Paul Mackerras Cc: akataria@vmware.com Cc: anil.s.keshavamurthy@intel.com Cc: Fenghua Yu Cc: Arnd Bergmann Cc: Rusty Russell Cc: Chris Wright Cc: yrl.pp-manager.tt@hitachi.com Cc: Kevin Hao Cc: Ananth N Mavinakayanahalli Cc: rdunlap@infradead.org Cc: dl9pf@gmx.de Cc: Linus Torvalds Cc: David S. Miller Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20140717114411.13401.2632.stgit@kbuild-fedora.novalocal Signed-off-by: Ingo Molnar --- kernel/kprobes.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 3214289df5a7..734e9a7d280b 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2037,19 +2037,23 @@ static int __init populate_kprobe_blacklist(unsigned long *start, { unsigned long *iter; struct kprobe_blacklist_entry *ent; - unsigned long offset = 0, size = 0; + unsigned long entry, offset = 0, size = 0; for (iter = start; iter < end; iter++) { - if (!kallsyms_lookup_size_offset(*iter, &size, &offset)) { - pr_err("Failed to find blacklist %p\n", (void *)*iter); + entry = arch_deref_entry_point((void *)*iter); + + if (!kernel_text_address(entry) || + !kallsyms_lookup_size_offset(entry, &size, &offset)) { + pr_err("Failed to find blacklist at %p\n", + (void *)entry); continue; } ent = kmalloc(sizeof(*ent), GFP_KERNEL); if (!ent) return -ENOMEM; - ent->start_addr = *iter; - ent->end_addr = *iter + size; + ent->start_addr = entry; + ent->end_addr = entry + size; INIT_LIST_HEAD(&ent->list); list_add_tail(&ent->list, &kprobe_blacklist); } From 8c26d458394be44e135d1c6bd4557e1c4e1a0535 Mon Sep 17 00:00:00 2001 From: Eliad Peller Date: Thu, 17 Jul 2014 15:00:56 +0300 Subject: [PATCH 289/455] cfg80211: fix mic_failure tracing tsc can be NULL (mac80211 currently always passes NULL), resulting in NULL-dereference. check before copying it. Cc: stable@vger.kernel.org Signed-off-by: Eliad Peller Signed-off-by: Emmanuel Grumbach Signed-off-by: Johannes Berg --- net/wireless/trace.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/wireless/trace.h b/net/wireless/trace.h index 560ed77084e9..7cc887f9da11 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -2094,7 +2094,8 @@ TRACE_EVENT(cfg80211_michael_mic_failure, MAC_ASSIGN(addr, addr); __entry->key_type = key_type; __entry->key_id = key_id; - memcpy(__entry->tsc, tsc, 6); + if (tsc) + memcpy(__entry->tsc, tsc, 6); ), TP_printk(NETDEV_PR_FMT ", " MAC_PR_FMT ", key type: %d, key id: %d, tsc: %pm", NETDEV_PR_ARG, MAC_PR_ARG(addr), __entry->key_type, From 03e97220b99b8b691ea5b130b7b4c135c9662792 Mon Sep 17 00:00:00 2001 From: Lucas Stach Date: Thu, 17 Jul 2014 12:20:14 +0200 Subject: [PATCH 290/455] ARM: clk-imx6q: parent lvds_sel input from upstream clock gates The i.MX6 reference manual doesn't make a clear distinction between the fixed clock divider and the enable gate for the pcie and sata reference clocks. This lead to the lvds mux inputs in the imx6q clk driver to be parented from the ref clock (which is the divider) instead of the actual gate, which in turn prevents the upstream clock to actually be enabled when lvds clk out is active. This fixes a hard machine hang regression in kernel 3.16 for boards where only pcie is active but no sata, as with this kernel version the imx6-pcie driver is no longer enabling the upstream clock directly but only lvds clk out. Reported-by: Arne Ruhnau Signed-off-by: Lucas Stach Tested-by: Arne Ruhnau Signed-off-by: Shawn Guo --- arch/arm/mach-imx/clk-imx6q.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-imx/clk-imx6q.c b/arch/arm/mach-imx/clk-imx6q.c index 8e795dea02ec..8556c787e59c 100644 --- a/arch/arm/mach-imx/clk-imx6q.c +++ b/arch/arm/mach-imx/clk-imx6q.c @@ -70,7 +70,7 @@ static const char *cko_sels[] = { "cko1", "cko2", }; static const char *lvds_sels[] = { "dummy", "dummy", "dummy", "dummy", "dummy", "dummy", "pll4_audio", "pll5_video", "pll8_mlb", "enet_ref", - "pcie_ref", "sata_ref", + "pcie_ref_125m", "sata_ref_100m", }; enum mx6q_clks { @@ -491,7 +491,7 @@ static void __init imx6q_clocks_init(struct device_node *ccm_node) /* All existing boards with PCIe use LVDS1 */ if (IS_ENABLED(CONFIG_PCI_IMX6)) - clk_set_parent(clk[lvds1_sel], clk[sata_ref]); + clk_set_parent(clk[lvds1_sel], clk[sata_ref_100m]); /* Set initial power mode */ imx6q_set_lpm(WAIT_CLOCKED); From 79272b3562bb44ce7dc720cd13136f5a4a53c618 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Fri, 20 Jun 2014 09:36:41 -0400 Subject: [PATCH 291/455] GFS2: Only wait for demote when last holder is dequeued Function gfs2_glock_dq_wait is supposed to dequeue a glock and then wait for the lock to be demoted. The problem is, if this is a shared lock, its demote will depend on the other holders, which means you might end up waiting forever because the other process is blocked. This problem is especially apparent when dealing with nested flocks. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/glock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index c355f7320e44..278fae5b6982 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1128,7 +1128,9 @@ void gfs2_glock_dq_wait(struct gfs2_holder *gh) struct gfs2_glock *gl = gh->gh_gl; gfs2_glock_dq(gh); might_sleep(); - wait_on_bit(&gl->gl_flags, GLF_DEMOTE, gfs2_glock_demote_wait, TASK_UNINTERRUPTIBLE); + if (!find_first_holder(gl)) + wait_on_bit(&gl->gl_flags, GLF_DEMOTE, gfs2_glock_demote_wait, + TASK_UNINTERRUPTIBLE); } /** From 94a09a3999ee978e097b5aad74034ed43bae56db Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Mon, 23 Jun 2014 14:43:32 +0100 Subject: [PATCH 292/455] GFS2: Fix race in glock lru glock disposal We must not leave items on the LRU list with GLF_LOCK set, since they can be removed if the glock is brought back into use, which may then potentially result in a hang, waiting for GLF_LOCK to clear. It doesn't happen very often, since it requires a glock that has not been used for a long time to be brought back into use at the same moment that the shrinker is part way through disposing of glocks. The fix is to set GLF_LOCK at a later time, when we already know that the other locks can be obtained. Also, we now only release the lru_lock in case a resched is needed, rather than on every iteration. Signed-off-by: Steven Whitehouse --- fs/gfs2/glock.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 278fae5b6982..c1e5b126d2ca 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1406,12 +1406,16 @@ __acquires(&lru_lock) gl = list_entry(list->next, struct gfs2_glock, gl_lru); list_del_init(&gl->gl_lru); if (!spin_trylock(&gl->gl_spin)) { +add_back_to_lru: list_add(&gl->gl_lru, &lru_list); atomic_inc(&lru_count); continue; } + if (test_and_set_bit(GLF_LOCK, &gl->gl_flags)) { + spin_unlock(&gl->gl_spin); + goto add_back_to_lru; + } clear_bit(GLF_LRU, &gl->gl_flags); - spin_unlock(&lru_lock); gl->gl_lockref.count++; if (demote_ok(gl)) handle_callback(gl, LM_ST_UNLOCKED, 0, false); @@ -1419,7 +1423,7 @@ __acquires(&lru_lock) if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0) gl->gl_lockref.count--; spin_unlock(&gl->gl_spin); - spin_lock(&lru_lock); + cond_resched_lock(&lru_lock); } } @@ -1444,7 +1448,7 @@ static long gfs2_scan_glock_lru(int nr) gl = list_entry(lru_list.next, struct gfs2_glock, gl_lru); /* Test for being demotable */ - if (!test_and_set_bit(GLF_LOCK, &gl->gl_flags)) { + if (!test_bit(GLF_LOCK, &gl->gl_flags)) { list_move(&gl->gl_lru, &dispose); atomic_dec(&lru_count); freed++; From fe0bbd2986996b9efe3a78bf5a591b0496c7afea Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Mon, 23 Jun 2014 14:50:20 +0100 Subject: [PATCH 293/455] GFS2: Use GFP_NOFS when allocating glocks Normally GFP_KERNEL is ok here, but there is now a rarely used code path relating to deallocation of unlinked inodes (in certain corner cases) which if hit at times of memory shortage can cause recursion while trying to free memory. One solution would be to try and move the gfs2_glock_get() call so that it is no longer called while another glock is held, but that doesn't look at all easy, so GFP_NOFS is the best solution for the time being. Signed-off-by: Steven Whitehouse --- fs/gfs2/glock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index c1e5b126d2ca..b703dcc91588 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -731,14 +731,14 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number, cachep = gfs2_glock_aspace_cachep; else cachep = gfs2_glock_cachep; - gl = kmem_cache_alloc(cachep, GFP_KERNEL); + gl = kmem_cache_alloc(cachep, GFP_NOFS); if (!gl) return -ENOMEM; memset(&gl->gl_lksb, 0, sizeof(struct dlm_lksb)); if (glops->go_flags & GLOF_LVB) { - gl->gl_lksb.sb_lvbptr = kzalloc(GFS2_MIN_LVB_SIZE, GFP_KERNEL); + gl->gl_lksb.sb_lvbptr = kzalloc(GFS2_MIN_LVB_SIZE, GFP_NOFS); if (!gl->gl_lksb.sb_lvbptr) { kmem_cache_free(cachep, gl); return -ENOMEM; From 6ec43b1838bd71633ac3f853c63ddf1f5940b1ed Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Wed, 25 Jun 2014 20:40:45 +0200 Subject: [PATCH 294/455] GFS2: replace count*size kzalloc by kcalloc kcalloc manages count*sizeof overflow. Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick Signed-off-by: Steven Whitehouse --- fs/gfs2/lock_dlm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index 91f274de1246..4fafea1c9ecf 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -1036,8 +1036,8 @@ static int set_recover_size(struct gfs2_sbd *sdp, struct dlm_slot *slots, new_size = old_size + RECOVER_SIZE_INC; - submit = kzalloc(new_size * sizeof(uint32_t), GFP_NOFS); - result = kzalloc(new_size * sizeof(uint32_t), GFP_NOFS); + submit = kcalloc(new_size, sizeof(uint32_t), GFP_NOFS); + result = kcalloc(new_size, sizeof(uint32_t), GFP_NOFS); if (!submit || !result) { kfree(submit); kfree(result); From 5bef3e7cf18c56cc733777c61b6b61a0b8a62b35 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Thu, 26 Jun 2014 10:46:25 -0400 Subject: [PATCH 295/455] GFS2: Allow flocks to use normal glock dq rather than dq_wait This patch allows flock glocks to use a non-blocking dequeue rather than dq_wait. It also reverts the previous patch I had posted regarding dq_wait. The reverted patch isn't necessarily a bad idea, but I decided this might avoid unforeseen side effects, and was therefore safer. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/file.c | 2 +- fs/gfs2/glock.c | 4 +--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 4fc3a3046174..491e8e023598 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -991,7 +991,7 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl) goto out; flock_lock_file_wait(file, &(struct file_lock){.fl_type = F_UNLCK}); - gfs2_glock_dq_wait(fl_gh); + gfs2_glock_dq(fl_gh); gfs2_holder_reinit(state, flags, fl_gh); } else { error = gfs2_glock_get(GFS2_SB(&ip->i_inode), ip->i_no_addr, diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index b703dcc91588..ee4e04fe60fc 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1128,9 +1128,7 @@ void gfs2_glock_dq_wait(struct gfs2_holder *gh) struct gfs2_glock *gl = gh->gh_gl; gfs2_glock_dq(gh); might_sleep(); - if (!find_first_holder(gl)) - wait_on_bit(&gl->gl_flags, GLF_DEMOTE, gfs2_glock_demote_wait, - TASK_UNINTERRUPTIBLE); + wait_on_bit(&gl->gl_flags, GLF_DEMOTE, gfs2_glock_demote_wait, TASK_UNINTERRUPTIBLE); } /** From 97a4f1d7653684fff0d50e9328917506f06e9d79 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Thu, 26 Jun 2014 10:47:48 -0400 Subject: [PATCH 296/455] GFS2: Allow caching of glocks for flock This patch removes the GLF_NOCACHE flag from the glocks associated with flocks. There should be no good reason not to cache glocks for flocks: they only force the glock to be demoted before they can be reacquired, which can slow down performance and even cause glock hangs, especially in cases where the flocks are held in Shared (SH) mode. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 491e8e023598..26b3f952e6b1 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -981,7 +981,7 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl) int error = 0; state = (fl->fl_type == F_WRLCK) ? LM_ST_EXCLUSIVE : LM_ST_SHARED; - flags = (IS_SETLKW(cmd) ? 0 : LM_FLAG_TRY) | GL_EXACT | GL_NOCACHE; + flags = (IS_SETLKW(cmd) ? 0 : LM_FLAG_TRY) | GL_EXACT; mutex_lock(&fp->f_fl_mutex); From 6b49d1d9c3c1088758c6a2758aaa5d236ef609e2 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Sun, 29 Jun 2014 12:21:39 +0200 Subject: [PATCH 297/455] GFS2: memcontrol: Spelling s/invlidate/invalidate/ Signed-off-by: Geert Uytterhoeven Cc: cluster-devel@redhat.com Signed-off-by: Steven Whitehouse --- fs/gfs2/glops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index fc1100781bbc..2ffc67dce87f 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -234,8 +234,8 @@ static void inode_go_sync(struct gfs2_glock *gl) * inode_go_inval - prepare a inode glock to be released * @gl: the glock * @flags: - * - * Normally we invlidate everything, but if we are moving into + * + * Normally we invalidate everything, but if we are moving into * LM_ST_DEFERRED from LM_ST_SHARED or LM_ST_EXCLUSIVE then we * can keep hold of the metadata, since it won't have changed. * From 27ff6a0f7f5bf500e9d2a8760c062789b52c551f Mon Sep 17 00:00:00 2001 From: Fabian Frederick Date: Wed, 2 Jul 2014 22:05:27 +0200 Subject: [PATCH 298/455] GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick Signed-off-by: Steven Whitehouse --- fs/gfs2/rgrp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index db629d1bd1bd..f4cb9c0d6bbd 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -337,7 +337,7 @@ static bool gfs2_unaligned_extlen(struct gfs2_rbm *rbm, u32 n_unaligned, u32 *le /** * gfs2_free_extlen - Return extent length of free blocks - * @rbm: Starting position + * @rrbm: Starting position * @len: Max length to check * * Starting at the block specified by the rbm, see how many free blocks @@ -2522,7 +2522,7 @@ void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state) /** * gfs2_rlist_free - free a resource group list - * @list: the list of resource groups + * @rlist: the list of resource groups * */ From 8cf2389bc4456057a4d9663d22429fb677f1b13e Mon Sep 17 00:00:00 2001 From: Sebastian Hesselbarth Date: Mon, 14 Jul 2014 16:23:29 +0100 Subject: [PATCH 299/455] ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable() commit 431a84b1a4f7d1a0085d5b91330c5053cc8e8b12 ("ARM: 8034/1: Disable preemption in iwmmxt_task_enable()") introduced macros {inc,dec}_preempt_count to iwmmxt_task_enable to make it run with preemption disabled. Unfortunately, other functions in iwmmxt.S also use concan_{save,dump,load} sections located in iwmmxt_task_enable() to deal with iWMMXt coprocessor. This causes an unbalanced preempt_count due to excessive dec_preempt_count and destroyed return addresses in callers of concan_ labels due to a register collision: Linux version 3.16.0-rc3-00062-gd92a333-dirty (jef@armhf) (gcc version 4.8.3 (Debian 4.8.3-4) ) #5 PREEMPT Thu Jul 3 19:46:39 CEST 2014 CPU: ARMv7 Processor [560f5815] revision 5 (ARMv7), cr=10c5387d CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache Machine model: SolidRun CuBox ... PJ4 iWMMXt v2 coprocessor enabled. ... Unable to handle kernel paging request at virtual address fffffffe pgd = bb25c000 [fffffffe] *pgd=3bfde821, *pte=00000000, *ppte=00000000 Internal error: Oops: 80000007 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 62 Comm: startpar Not tainted 3.16.0-rc3-00062-gd92a333-dirty #5 task: bb230b80 ti: bb256000 task.ti: bb256000 PC is at 0xfffffffe LR is at iwmmxt_task_copy+0x44/0x4c pc : [] lr : [<800130ac>] psr: 40000033 sp : bb257de8 ip : 00000013 fp : bb257ea4 r10: bb256000 r9 : fffffdfe r8 : 76e898e6 r7 : bb257ec8 r6 : bb256000 r5 : 7ea12760 r4 : 000000a0 r3 : ffffffff r2 : 00000003 r1 : bb257df8 r0 : 00000000 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 10c5387d Table: 3b25c019 DAC: 00000015 Process startpar (pid: 62, stack limit = 0xbb256248) This patch fixes the issue by moving concan_{save,dump,load} into separate code sections and make iwmmxt_task_enable() call them in the same way the other functions use concan_ symbols. The test for valid ownership is moved to concan_save and is safe for the other user of it, iwmmxt_task_disable(). The register collision is also resolved by moving concan_ symbols as {inc,dec}_preempt_count are now local to iwmmxt_task_enable(). Fixes: 431a84b1a4f7 ("ARM: 8034/1: Disable preemption in iwmmxt_task_enable()") Signed-off-by: Sebastian Hesselbarth Acked-by: Catalin Marinas Reported-by: Jean-Francois Moine Signed-off-by: Russell King --- arch/arm/kernel/iwmmxt.S | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S index a5599cfc43cb..2b32978ae905 100644 --- a/arch/arm/kernel/iwmmxt.S +++ b/arch/arm/kernel/iwmmxt.S @@ -94,13 +94,19 @@ ENTRY(iwmmxt_task_enable) mrc p15, 0, r2, c2, c0, 0 mov r2, r2 @ cpwait + bl concan_save - teq r1, #0 @ test for last ownership - mov lr, r9 @ normal exit from exception - beq concan_load @ no owner, skip save +#ifdef CONFIG_PREEMPT_COUNT + get_thread_info r10 +#endif +4: dec_preempt_count r10, r3 + mov pc, r9 @ normal exit from exception concan_save: + teq r1, #0 @ test for last ownership + beq concan_load @ no owner, skip save + tmrc r2, wCon @ CUP? wCx @@ -138,7 +144,7 @@ concan_dump: wstrd wR15, [r1, #MMX_WR15] 2: teq r0, #0 @ anything to load? - beq 3f + moveq pc, lr @ if not, return concan_load: @@ -171,14 +177,9 @@ concan_load: @ clear CUP/MUP (only if r1 != 0) teq r1, #0 mov r2, #0 - beq 3f - tmcr wCon, r2 + moveq pc, lr -3: -#ifdef CONFIG_PREEMPT_COUNT - get_thread_info r10 -#endif -4: dec_preempt_count r10, r3 + tmcr wCon, r2 mov pc, lr /* From 29e697b11853d3f83b1864ae385abdad4aa2c361 Mon Sep 17 00:00:00 2001 From: Tomasz Figa Date: Thu, 17 Jul 2014 17:23:44 +0200 Subject: [PATCH 300/455] irqchip: gic: Fix core ID calculation when topology is read from DT Certain GIC implementation, namely those found on earlier, single cluster, Exynos SoCs, have registers mapped without per-CPU banking, which means that the driver needs to use different offset for each CPU. Currently the driver calculates the offset by multiplying value returned by cpu_logical_map() by CPU offset parsed from DT. This is correct when CPU topology is not specified in DT and aforementioned function returns core ID alone. However when DT contains CPU topology, the function changes to return cluster ID as well, which is non-zero on mentioned SoCs and so breaks the calculation in GIC driver. This patch fixes this by masking out cluster ID in CPU offset calculation so that only core ID is considered. Multi-cluster Exynos SoCs already have banked GIC implementations, so this simple fix should be enough. Reported-by: Lorenzo Pieralisi Reported-by: Bartlomiej Zolnierkiewicz Signed-off-by: Tomasz Figa Fixes: db0d4db22a78d ("ARM: gic: allow GIC to support non-banked setups") Cc: # v3.3+ Link: https://lkml.kernel.org/r/1405610624-18722-1-git-send-email-t.figa@samsung.com Signed-off-by: Jason Cooper --- drivers/irqchip/irq-gic.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 66ed8922dab2..7c131cf7cc13 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -42,6 +42,7 @@ #include #include +#include #include #include #include @@ -954,7 +955,9 @@ void __init gic_init_bases(unsigned int gic_nr, int irq_start, } for_each_possible_cpu(cpu) { - unsigned long offset = percpu_offset * cpu_logical_map(cpu); + u32 mpidr = cpu_logical_map(cpu); + u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); + unsigned long offset = percpu_offset * core_id; *per_cpu_ptr(gic->dist_base.percpu_base, cpu) = dist_base + offset; *per_cpu_ptr(gic->cpu_base.percpu_base, cpu) = cpu_base + offset; } From dba1fd0bff38966f16bbe194fb451f73ddaafb58 Mon Sep 17 00:00:00 2001 From: Bo Shen Date: Mon, 14 Jul 2014 11:08:14 +0800 Subject: [PATCH 301/455] ARM: at91: at91sam9x5: correct typo error for ohci clock Correct the typo error for the second "uhphs_clk". Signed-off-by: Bo Shen Acked-by: Boris Brezillon Signed-off-by: Nicolas Ferre --- arch/arm/boot/dts/at91sam9x5.dtsi | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/at91sam9x5.dtsi b/arch/arm/boot/dts/at91sam9x5.dtsi index d6133f497207..ae34c9ca4bba 100644 --- a/arch/arm/boot/dts/at91sam9x5.dtsi +++ b/arch/arm/boot/dts/at91sam9x5.dtsi @@ -1153,8 +1153,7 @@ compatible = "atmel,at91rm9200-ohci", "usb-ohci"; reg = <0x00600000 0x100000>; interrupts = <22 IRQ_TYPE_LEVEL_HIGH 2>; - clocks = <&usb>, <&uhphs_clk>, <&udphs_clk>, - <&uhpck>; + clocks = <&usb>, <&uhphs_clk>, <&uhphs_clk>, <&uhpck>; clock-names = "usb_clk", "ohci_clk", "hclk", "uhpck"; status = "disabled"; }; From 043dfc1b624caf67a52412412a7ccce2d7d2b7f5 Mon Sep 17 00:00:00 2001 From: Boris BREZILLON Date: Mon, 14 Jul 2014 08:39:27 +0200 Subject: [PATCH 302/455] ARM: at91/dt: fix usb0 clocks definition in sam9n12 dtsi udphs_clk (USB Device Controller clock) is referenced instead of uhphs_clk (USB Host Controller clock). Signed-off-by: Boris BREZILLON Acked-by: Alexandre Belloni Signed-off-by: Nicolas Ferre --- arch/arm/boot/dts/at91sam9n12.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/at91sam9n12.dtsi b/arch/arm/boot/dts/at91sam9n12.dtsi index 287795985e32..b84bac5bada4 100644 --- a/arch/arm/boot/dts/at91sam9n12.dtsi +++ b/arch/arm/boot/dts/at91sam9n12.dtsi @@ -925,7 +925,7 @@ compatible = "atmel,at91rm9200-ohci", "usb-ohci"; reg = <0x00500000 0x00100000>; interrupts = <22 IRQ_TYPE_LEVEL_HIGH 2>; - clocks = <&usb>, <&uhphs_clk>, <&udphs_clk>, + clocks = <&usb>, <&uhphs_clk>, <&uhphs_clk>, <&uhpck>; clock-names = "usb_clk", "ohci_clk", "hclk", "uhpck"; status = "disabled"; From e0d69e119fc6bf7cc3c9f791478108c1b925bb2e Mon Sep 17 00:00:00 2001 From: Boris BREZILLON Date: Thu, 17 Jul 2014 21:03:58 +0200 Subject: [PATCH 303/455] ARM: at91/dt: add missing clocks property to pwm node in sam9x5.dtsi The pwm driver requires a clocks property referencing the pwm peripheral clk. Signed-off-by: Boris BREZILLON Signed-off-by: Nicolas Ferre --- arch/arm/boot/dts/at91sam9x5.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/at91sam9x5.dtsi b/arch/arm/boot/dts/at91sam9x5.dtsi index ae34c9ca4bba..5ef716ddd5fe 100644 --- a/arch/arm/boot/dts/at91sam9x5.dtsi +++ b/arch/arm/boot/dts/at91sam9x5.dtsi @@ -1122,6 +1122,7 @@ compatible = "atmel,at91sam9rl-pwm"; reg = <0xf8034000 0x300>; interrupts = <18 IRQ_TYPE_LEVEL_HIGH 4>; + clocks = <&pwm_clk>; #pwm-cells = <3>; status = "disabled"; }; From 2e58cdcc22148d89ccea8f900280736e5f585c07 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Tue, 8 Jul 2014 15:18:07 -0700 Subject: [PATCH 304/455] Input: st-keyscan - fix 'defined but not used' compiler warnings Add #ifdef CONFIG_PM_SLEEP around keyscan_supend() and keyscan_resume() to fix the following compiler warnings occuring if CONFIG_PM_SLEEP is unset: + /scratch/kisskb/src/drivers/input/keyboard/st-keyscan.c: warning: 'keyscan_resume' defined but not used [-Wunused-function]: => 235:12 + /scratch/kisskb/src/drivers/input/keyboard/st-keyscan.c: warning: 'keyscan_suspend' defined but not used [-Wunused-function]: => 218:12 Reported-by: Geert Uytterhoeven Link: https://lkml.org/lkml/2014/7/8/109 Signed-off-by: Tobias Klauser Signed-off-by: Dmitry Torokhov --- drivers/input/keyboard/st-keyscan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/input/keyboard/st-keyscan.c b/drivers/input/keyboard/st-keyscan.c index 758b48731415..de7be4f03d91 100644 --- a/drivers/input/keyboard/st-keyscan.c +++ b/drivers/input/keyboard/st-keyscan.c @@ -215,6 +215,7 @@ static int keyscan_probe(struct platform_device *pdev) return 0; } +#ifdef CONFIG_PM_SLEEP static int keyscan_suspend(struct device *dev) { struct platform_device *pdev = to_platform_device(dev); @@ -249,6 +250,7 @@ static int keyscan_resume(struct device *dev) mutex_unlock(&input->mutex); return retval; } +#endif static SIMPLE_DEV_PM_OPS(keyscan_dev_pm_ops, keyscan_suspend, keyscan_resume); From 67f4aef20055afec73e37e7752bc6cc74fa01dea Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Fri, 18 Jul 2014 10:05:38 -0700 Subject: [PATCH 305/455] Input: sirfsoc-onkey - fix GPL v2 license string typo Per license_is_gpl_compatible(), the MODULE_LICENSE() string for GPL v2 is "GPL v2", not "GPLv2". Use "GPL v2" so this module doesn't taint the kernel. Signed-off-by: Bjorn Helgaas Signed-off-by: Dmitry Torokhov --- drivers/input/misc/sirfsoc-onkey.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/input/misc/sirfsoc-onkey.c b/drivers/input/misc/sirfsoc-onkey.c index e4104f9b2e6d..fed5102e1802 100644 --- a/drivers/input/misc/sirfsoc-onkey.c +++ b/drivers/input/misc/sirfsoc-onkey.c @@ -213,7 +213,7 @@ static struct platform_driver sirfsoc_pwrc_driver = { module_platform_driver(sirfsoc_pwrc_driver); -MODULE_LICENSE("GPLv2"); +MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Binghua Duan , Xianglong Du "); MODULE_DESCRIPTION("CSR Prima2 PWRC Driver"); MODULE_ALIAS("platform:sirfsoc-pwrc"); From b32bfc06aefab61acc872dec3222624e6cd867ed Mon Sep 17 00:00:00 2001 From: Romain Degez Date: Fri, 11 Jul 2014 18:08:13 +0200 Subject: [PATCH 306/455] ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode) Add support of the Promise FastTrak TX8660 SATA HBA in ahci mode by registering the board in the ahci_pci_tbl[]. Note: this HBA also provide a hardware RAID mode when activated in BIOS but specific drivers from the manufacturer are required in this case. Signed-off-by: Romain Degez Tested-by: Romain Degez Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org --- drivers/ata/ahci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index dae5607e1115..4cd52a4541a9 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -456,6 +456,7 @@ static const struct pci_device_id ahci_pci_tbl[] = { /* Promise */ { PCI_VDEVICE(PROMISE, 0x3f20), board_ahci }, /* PDC42819 */ + { PCI_VDEVICE(PROMISE, 0x3781), board_ahci }, /* FastTrak TX8660 ahci-mode */ /* Asmedia */ { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci }, /* ASM1060 */ From 9637f30e6b7bc394c08fa9d27d63622f141142e9 Mon Sep 17 00:00:00 2001 From: Tomasz Figa Date: Wed, 16 Jul 2014 02:59:18 +0900 Subject: [PATCH 307/455] ARM: EXYNOS: Fix core ID used by platsmp and hotplug code When CPU topology is specified in device tree, cpu_logical_map() does not return core ID anymore, but rather full MPIDR value. This breaks existing calculation of PMU register offsets on Exynos SoCs. This patch fixes the problem by adjusting the code to use only core ID bits of the value returned by cpu_logical_map() to allow CPU topology to be specified in device tree on Exynos SoCs. Signed-off-by: Tomasz Figa Signed-off-by: Kukjin Kim Signed-off-by: Olof Johansson --- arch/arm/mach-exynos/hotplug.c | 10 ++++++---- arch/arm/mach-exynos/platsmp.c | 34 +++++++++++++++++++--------------- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/arch/arm/mach-exynos/hotplug.c b/arch/arm/mach-exynos/hotplug.c index 8a134d019cb3..920a4baa53cd 100644 --- a/arch/arm/mach-exynos/hotplug.c +++ b/arch/arm/mach-exynos/hotplug.c @@ -40,15 +40,17 @@ static inline void cpu_leave_lowpower(void) static inline void platform_do_lowpower(unsigned int cpu, int *spurious) { + u32 mpidr = cpu_logical_map(cpu); + u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); + for (;;) { - /* make cpu1 to be turned off at next WFI command */ - if (cpu == 1) - exynos_cpu_power_down(cpu); + /* Turn the CPU off on next WFI instruction. */ + exynos_cpu_power_down(core_id); wfi(); - if (pen_release == cpu_logical_map(cpu)) { + if (pen_release == core_id) { /* * OK, proper wakeup, we're done */ diff --git a/arch/arm/mach-exynos/platsmp.c b/arch/arm/mach-exynos/platsmp.c index 1c8d31e39520..50b9aad5e27b 100644 --- a/arch/arm/mach-exynos/platsmp.c +++ b/arch/arm/mach-exynos/platsmp.c @@ -90,7 +90,8 @@ static void exynos_secondary_init(unsigned int cpu) static int exynos_boot_secondary(unsigned int cpu, struct task_struct *idle) { unsigned long timeout; - unsigned long phys_cpu = cpu_logical_map(cpu); + u32 mpidr = cpu_logical_map(cpu); + u32 core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); int ret = -ENOSYS; /* @@ -104,17 +105,18 @@ static int exynos_boot_secondary(unsigned int cpu, struct task_struct *idle) * the holding pen - release it, then wait for it to flag * that it has been released by resetting pen_release. * - * Note that "pen_release" is the hardware CPU ID, whereas + * Note that "pen_release" is the hardware CPU core ID, whereas * "cpu" is Linux's internal ID. */ - write_pen_release(phys_cpu); + write_pen_release(core_id); - if (!exynos_cpu_power_state(cpu)) { - exynos_cpu_power_up(cpu); + if (!exynos_cpu_power_state(core_id)) { + exynos_cpu_power_up(core_id); timeout = 10; /* wait max 10 ms until cpu1 is on */ - while (exynos_cpu_power_state(cpu) != S5P_CORE_LOCAL_PWR_EN) { + while (exynos_cpu_power_state(core_id) + != S5P_CORE_LOCAL_PWR_EN) { if (timeout-- == 0) break; @@ -145,20 +147,20 @@ static int exynos_boot_secondary(unsigned int cpu, struct task_struct *idle) * Try to set boot address using firmware first * and fall back to boot register if it fails. */ - ret = call_firmware_op(set_cpu_boot_addr, phys_cpu, boot_addr); + ret = call_firmware_op(set_cpu_boot_addr, core_id, boot_addr); if (ret && ret != -ENOSYS) goto fail; if (ret == -ENOSYS) { - void __iomem *boot_reg = cpu_boot_reg(phys_cpu); + void __iomem *boot_reg = cpu_boot_reg(core_id); if (IS_ERR(boot_reg)) { ret = PTR_ERR(boot_reg); goto fail; } - __raw_writel(boot_addr, cpu_boot_reg(phys_cpu)); + __raw_writel(boot_addr, cpu_boot_reg(core_id)); } - call_firmware_op(cpu_boot, phys_cpu); + call_firmware_op(cpu_boot, core_id); arch_send_wakeup_ipi_mask(cpumask_of(cpu)); @@ -227,22 +229,24 @@ static void __init exynos_smp_prepare_cpus(unsigned int max_cpus) * boot register if it fails. */ for (i = 1; i < max_cpus; ++i) { - unsigned long phys_cpu; unsigned long boot_addr; + u32 mpidr; + u32 core_id; int ret; - phys_cpu = cpu_logical_map(i); + mpidr = cpu_logical_map(i); + core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); boot_addr = virt_to_phys(exynos4_secondary_startup); - ret = call_firmware_op(set_cpu_boot_addr, phys_cpu, boot_addr); + ret = call_firmware_op(set_cpu_boot_addr, core_id, boot_addr); if (ret && ret != -ENOSYS) break; if (ret == -ENOSYS) { - void __iomem *boot_reg = cpu_boot_reg(phys_cpu); + void __iomem *boot_reg = cpu_boot_reg(core_id); if (IS_ERR(boot_reg)) break; - __raw_writel(boot_addr, cpu_boot_reg(phys_cpu)); + __raw_writel(boot_addr, cpu_boot_reg(core_id)); } } } From 79a8468747c5f95ed3d5ce8376a3e82e0c5857fc Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Fri, 18 Jul 2014 17:26:41 -0400 Subject: [PATCH 308/455] random: check for increase of entropy_count because of signed conversion The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could actually increase entropy_count if during assignment of the unsigned expression on the RHS (mind the -=) we reduce the value modulo 2^width(int) and assign it to entropy_count. Trinity found this. [ Commit modified by tytso to add an additional safety check for a negative entropy_count -- which should never happen, and to also add an additional paranoia check to prevent overly large count values to be passed into urandom_read(). ] Reported-by: Dave Jones Signed-off-by: Hannes Frederic Sowa Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org --- drivers/char/random.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a7b252..71529e196b84 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -641,7 +641,7 @@ retry: } while (unlikely(entropy_count < pool_size-2 && pnfrac)); } - if (entropy_count < 0) { + if (unlikely(entropy_count < 0)) { pr_warn("random: negative entropy/overflow: pool %s count %d\n", r->name, entropy_count); WARN_ON(1); @@ -981,7 +981,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min, int reserved) { int entropy_count, orig; - size_t ibytes; + size_t ibytes, nfrac; BUG_ON(r->entropy_count > r->poolinfo->poolfracbits); @@ -999,7 +999,17 @@ retry: } if (ibytes < min) ibytes = 0; - if ((entropy_count -= ibytes << (ENTROPY_SHIFT + 3)) < 0) + + if (unlikely(entropy_count < 0)) { + pr_warn("random: negative entropy count: pool %s count %d\n", + r->name, entropy_count); + WARN_ON(1); + entropy_count = 0; + } + nfrac = ibytes << (ENTROPY_SHIFT + 3); + if ((size_t) entropy_count > nfrac) + entropy_count -= nfrac; + else entropy_count = 0; if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig) @@ -1376,6 +1386,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) "with %d bits of entropy available\n", current->comm, nonblocking_pool.entropy_total); + nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3)); ret = extract_entropy_user(&nonblocking_pool, buf, nbytes); trace_urandom_read(8 * nbytes, ENTROPY_BITS(&nonblocking_pool), From 98ce2deda23a303682a4253f3016a1436f4b2735 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Thu, 17 Jul 2014 16:08:36 +0800 Subject: [PATCH 309/455] Btrfs: fix abnormal long waiting in fsync xfstests generic/127 detected this problem. With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush data within the passed range. This is the cause of the above problem, -- btrfs's fsync has a stage called 'sync log' which will wait for all the ordered extents it've recorded to finish. In xfstests/generic/127, with mixed operations such as truncate, fallocate, punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will mmap, and then msync. And I find that msync will wait for quite a long time (about 20s in my case), thanks to ftrace, it turns out that the previous fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, there can be some ordered extents created but not getting corresponding pages flushed, then they're left in memory until we fsync which runs into the stage 'sync log', and fsync will just wait for the system writeback thread to flush those pages and get ordered extents finished, so the latency is inevitable. This adds a flush similar to btrfs_start_ordered_extent() in btrfs_wait_logged_extents() to fix that. Reviewed-by: Miao Xie Signed-off-by: Liu Bo Signed-off-by: Chris Mason --- fs/btrfs/ordered-data.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index e12441c7cf1d..7187b14faa6c 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -484,8 +484,19 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 transid) log_list); list_del_init(&ordered->log_list); spin_unlock_irq(&log->log_extents_lock[index]); + + if (!test_bit(BTRFS_ORDERED_IO_DONE, &ordered->flags) && + !test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { + struct inode *inode = ordered->inode; + u64 start = ordered->file_offset; + u64 end = ordered->file_offset + ordered->len - 1; + + WARN_ON(!inode); + filemap_fdatawrite_range(inode->i_mapping, start, end); + } wait_event(ordered->wait, test_bit(BTRFS_ORDERED_IO_DONE, &ordered->flags)); + btrfs_put_ordered_extent(ordered); spin_lock_irq(&log->log_extents_lock[index]); } From 0bfaa9c5cb479cebc24979b384374fe47500b4c9 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Mon, 7 Jul 2014 12:34:49 -0500 Subject: [PATCH 310/455] btrfs: test for valid bdev before kobj removal in btrfs_rm_device commit 99994cd btrfs: dev delete should remove sysfs entry added a btrfs_kobj_rm_device, which dereferences device->bdev... right after we check whether device->bdev might be NULL. I don't honestly know if it's possible to have a NULL device->bdev here, but assuming that it is (given the test), we need to move the kobject removal to be under that test. (Coverity spotted this) Signed-off-by: Eric Sandeen Signed-off-by: Chris Mason --- fs/btrfs/volumes.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6104676857f5..6cb82f62cb7c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1680,11 +1680,11 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (device->bdev == root->fs_info->fs_devices->latest_bdev) root->fs_info->fs_devices->latest_bdev = next_device->bdev; - if (device->bdev) + if (device->bdev) { device->fs_devices->open_devices--; - - /* remove sysfs entry */ - btrfs_kobj_rm_device(root->fs_info, device); + /* remove sysfs entry */ + btrfs_kobj_rm_device(root->fs_info, device); + } call_rcu(&device->rcu, free_device); From ae5db6d12341684913a78b6537c0b9c22c999b5c Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 20 Jul 2014 12:56:34 +0200 Subject: [PATCH 311/455] Revert "um: Fix wait_stub_done() error handling" This reverts commit 0974a9cadc7886f7baaa458bb0c89f5c5f9d458e. The real for for that issue is to release current->mm->mmap_sem in fix_range_common(). Signed-off-by: Richard Weinberger --- arch/um/os-Linux/skas/process.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c index d531879a4617..908579f2b0ab 100644 --- a/arch/um/os-Linux/skas/process.c +++ b/arch/um/os-Linux/skas/process.c @@ -54,7 +54,7 @@ static int ptrace_dump_regs(int pid) void wait_stub_done(int pid) { - int n, status, err, bad_stop = 0; + int n, status, err; while (1) { CATCH_EINTR(n = waitpid(pid, &status, WUNTRACED | __WALL)); @@ -74,8 +74,6 @@ void wait_stub_done(int pid) if (((1 << WSTOPSIG(status)) & STUB_DONE_MASK) != 0) return; - else - bad_stop = 1; bad_wait: err = ptrace_dump_regs(pid); @@ -85,10 +83,7 @@ bad_wait: printk(UM_KERN_ERR "wait_stub_done : failed to wait for SIGTRAP, " "pid = %d, n = %d, errno = %d, status = 0x%x\n", pid, n, errno, status); - if (bad_stop) - kill(pid, SIGKILL); - else - fatal_sigsegv(); + fatal_sigsegv(); } extern unsigned long current_stub_stack(void); From 284e6d39516cc7f9fbceebb259849fcb41559a7b Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 20 Jul 2014 13:09:15 +0200 Subject: [PATCH 312/455] um: Ensure that a stub page cannot get unmapped MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Trinity discovered an execution path such that a task can unmap his stub page. Reported-by: Toralf Förster Signed-off-by: Richard Weinberger --- arch/um/kernel/tlb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c index 9472079471bb..1fc619e5dfe9 100644 --- a/arch/um/kernel/tlb.c +++ b/arch/um/kernel/tlb.c @@ -124,6 +124,9 @@ static int add_munmap(unsigned long addr, unsigned long len, struct host_vm_op *last; int ret = 0; + if ((addr >= STUB_START) && (addr < STUB_END)) + return -EINVAL; + if (hvc->index != 0) { last = &hvc->ops[hvc->index - 1]; if ((last->type == MUNMAP) && From 468f65976a8d065ee1f27782337f4ee85a9151c5 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 20 Jul 2014 13:16:20 +0200 Subject: [PATCH 313/455] um: Fix hung task in fix_range_common() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If do_ops() fails we have to release current->mm->mmap_sem otherwise the failing task will never terminate. Reported-by: Toralf Förster Signed-off-by: Richard Weinberger --- arch/um/kernel/tlb.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c index 1fc619e5dfe9..f1b3eb14b855 100644 --- a/arch/um/kernel/tlb.c +++ b/arch/um/kernel/tlb.c @@ -12,6 +12,7 @@ #include #include #include +#include struct host_vm_change { struct host_vm_op { @@ -286,8 +287,11 @@ void fix_range_common(struct mm_struct *mm, unsigned long start_addr, /* This is not an else because ret is modified above */ if (ret) { printk(KERN_ERR "fix_range_common: failed, killing current " - "process\n"); + "process: %d\n", task_tgid_vnr(current)); + /* We are under mmap_sem, release it such that current can terminate */ + up_write(¤t->mm->mmap_sem); force_sig(SIGKILL, current); + do_signal(); } } From bb6a1b2e189f797c0e4a116aec7ce77c344f11e0 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 20 Jul 2014 13:39:27 +0200 Subject: [PATCH 314/455] um: segv: Save regs only in case of a kernel mode fault ...otherwise me lose user mode regs and the resulting stack trace is useless. Signed-off-by: Richard Weinberger --- arch/um/kernel/trap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 974b87474a99..5678c3571e7c 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -206,7 +206,7 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user, int is_write = FAULT_WRITE(fi); unsigned long address = FAULT_ADDRESS(fi); - if (regs) + if (!is_user && regs) current->thread.segv_regs = container_of(regs, struct pt_regs, regs); if (!is_user && (address >= start_vm) && (address < end_vm)) { From 61bd55ce1667809f022be88da77db17add90ea4e Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Thu, 17 Jul 2014 16:59:00 +0100 Subject: [PATCH 315/455] iio: buffer: Fix demux table creation When creating the demux table we need to iterate over the selected scan mask for the buffer to get the samples which should be copied to destination buffer. Right now the code uses the mask which contains all active channels, which means the demux table contains entries which causes it to copy all the samples from source to destination buffer one by one without doing any demuxing. Signed-off-by: Lars-Peter Clausen Signed-off-by: Jonathan Cameron Cc: Stable@vger.kernel.org --- drivers/iio/industrialio-buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c index 36b1ae92e239..9f1a14009901 100644 --- a/drivers/iio/industrialio-buffer.c +++ b/drivers/iio/industrialio-buffer.c @@ -966,7 +966,7 @@ static int iio_buffer_update_demux(struct iio_dev *indio_dev, /* Now we have the two masks, work from least sig and build up sizes */ for_each_set_bit(out_ind, - indio_dev->active_scan_mask, + buffer->scan_mask, indio_dev->masklength) { in_ind = find_next_bit(indio_dev->active_scan_mask, indio_dev->masklength, From 381676d5e86596b11e22a62f196e192df6091373 Mon Sep 17 00:00:00 2001 From: Peter Meerwald Date: Wed, 16 Jul 2014 19:32:00 +0100 Subject: [PATCH 316/455] iio:bma180: Fix scale factors to report correct acceleration units The userspace interface for acceleration sensors is documented as using m/s^2 units [Documentation/ABI/testing/sysfs-bus-iio] The fullscale raw values for the BMA80 corresponds to -/+ 1, 1.5, 2, etc G depending on the selected mode. The scale table was converting to G rather than m/s^2. Change the scaling table to match the documented interface. See commit 71702e6e, iio: mma8452: Use correct acceleration units, for a related fix. Signed-off-by: Peter Meerwald Cc: Oleksandr Kravchenko Signed-off-by: Jonathan Cameron Cc: Stable@vger.kernel.org --- drivers/iio/accel/bma180.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iio/accel/bma180.c b/drivers/iio/accel/bma180.c index a7e68c81f89d..28388bb5d583 100644 --- a/drivers/iio/accel/bma180.c +++ b/drivers/iio/accel/bma180.c @@ -68,13 +68,13 @@ /* Defaults values */ #define BMA180_DEF_PMODE 0 #define BMA180_DEF_BW 20 -#define BMA180_DEF_SCALE 250 +#define BMA180_DEF_SCALE 2452 /* Available values for sysfs */ #define BMA180_FLP_FREQ_AVAILABLE \ "10 20 40 75 150 300" #define BMA180_SCALE_AVAILABLE \ - "0.000130 0.000190 0.000250 0.000380 0.000500 0.000990 0.001980" + "0.001275 0.001863 0.002452 0.003727 0.004903 0.009709 0.019417" struct bma180_data { struct i2c_client *client; @@ -94,7 +94,7 @@ enum bma180_axis { }; static int bw_table[] = { 10, 20, 40, 75, 150, 300 }; /* Hz */ -static int scale_table[] = { 130, 190, 250, 380, 500, 990, 1980 }; +static int scale_table[] = { 1275, 1863, 2452, 3727, 4903, 9709, 19417 }; static int bma180_get_acc_reg(struct bma180_data *data, enum bma180_axis axis) { From 9b2a4d35a6ceaf217be61ed8eb3c16986244f640 Mon Sep 17 00:00:00 2001 From: Peter Meerwald Date: Wed, 16 Jul 2014 19:32:00 +0100 Subject: [PATCH 317/455] iio:bma180: Missing check for frequency fractional part val2 should be zero This will make no difference for correct inputs but will reject incorrect ones with a decimal part in the value written to the sysfs interface. Signed-off-by: Peter Meerwald Cc: Oleksandr Kravchenko Signed-off-by: Jonathan Cameron Cc: Stable@vger.kernel.org --- drivers/iio/accel/bma180.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iio/accel/bma180.c b/drivers/iio/accel/bma180.c index 28388bb5d583..a077cc86421b 100644 --- a/drivers/iio/accel/bma180.c +++ b/drivers/iio/accel/bma180.c @@ -376,6 +376,8 @@ static int bma180_write_raw(struct iio_dev *indio_dev, mutex_unlock(&data->mutex); return ret; case IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY: + if (val2) + return -EINVAL; mutex_lock(&data->mutex); ret = bma180_set_bw(data, val); mutex_unlock(&data->mutex); From 50c5d36dab930b1f1b1e3348b8608aa8b9ee7610 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Sat, 19 Jul 2014 16:30:31 -0700 Subject: [PATCH 318/455] Input: fix defuzzing logic We attempt to remove noise from coordinates reported by devices in input_handle_abs_event(), unfortunately, unless we were dropping the event altogether, we were ignoring the adjusted value and were passing on the original value instead. Cc: stable@vger.kernel.org Reviewed-by: Andrew de los Reyes Reviewed-by: Benson Leung Reviewed-by: David Herrmann Reviewed-by: Henrik Rydberg Signed-off-by: Dmitry Torokhov --- drivers/input/input.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/input/input.c b/drivers/input/input.c index 1c4c0db05550..29ca0bb4f561 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -257,9 +257,10 @@ static int input_handle_abs_event(struct input_dev *dev, } static int input_get_disposition(struct input_dev *dev, - unsigned int type, unsigned int code, int value) + unsigned int type, unsigned int code, int *pval) { int disposition = INPUT_IGNORE_EVENT; + int value = *pval; switch (type) { @@ -357,6 +358,7 @@ static int input_get_disposition(struct input_dev *dev, break; } + *pval = value; return disposition; } @@ -365,7 +367,7 @@ static void input_handle_event(struct input_dev *dev, { int disposition; - disposition = input_get_disposition(dev, type, code, value); + disposition = input_get_disposition(dev, type, code, &value); if ((disposition & INPUT_PASS_TO_DEVICE) && dev->event) dev->event(dev, type, code, value); From 7801db8aec957fa6610efe0ee26a6c8bc0f1d73b Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Thu, 17 Jul 2014 17:34:53 -0700 Subject: [PATCH 319/455] net_sched: avoid generating same handle for u32 filters When kernel generates a handle for a u32 filter, it tries to start from the max in the bucket. So when we have a filter with the max (fff) handle, it will cause kernel always generates the same handle for new filters. This can be shown by the following command: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff ... we will get some u32 filters with same handle: # tc filter show dev eth0 parent ffff: filter protocol ip pref 770 u32 filter protocol ip pref 770 u32 fh 800: ht divisor 1 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 handles should be unique. This patch fixes it by looking up a bitmap, so that can guarantee the handle is as unique as possible. For compatibility, we still start from 0x800. Cc: "David S. Miller" Signed-off-by: Cong Wang Signed-off-by: Cong Wang Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller --- net/sched/cls_u32.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index c39b583ace32..70c0be8d0121 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -460,17 +461,25 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg) return 0; } +#define NR_U32_NODE (1<<12) static u32 gen_new_kid(struct tc_u_hnode *ht, u32 handle) { struct tc_u_knode *n; - unsigned int i = 0x7FF; + unsigned long i; + unsigned long *bitmap = kzalloc(BITS_TO_LONGS(NR_U32_NODE) * sizeof(unsigned long), + GFP_KERNEL); + if (!bitmap) + return handle | 0xFFF; for (n = ht->ht[TC_U32_HASH(handle)]; n; n = n->next) - if (i < TC_U32_NODE(n->handle)) - i = TC_U32_NODE(n->handle); - i++; + set_bit(TC_U32_NODE(n->handle), bitmap); - return handle | (i > 0xFFF ? 0xFFF : i); + i = find_next_zero_bit(bitmap, NR_U32_NODE, 0x800); + if (i >= NR_U32_NODE) + i = find_next_zero_bit(bitmap, NR_U32_NODE, 1); + + kfree(bitmap); + return handle | (i >= NR_U32_NODE ? 0xFFF : i); } static const struct nla_policy u32_policy[TCA_U32_MAX + 1] = { From 1a998d3e6bc1e44f4c0bc7509bdedef8ed3845ec Mon Sep 17 00:00:00 2001 From: Zoltan Kiss Date: Fri, 18 Jul 2014 19:08:02 +0100 Subject: [PATCH 320/455] xen-netback: Fix handling frag_list on grant op error path The error handling for skb's with frag_list was completely wrong, it caused double unmap attempts to happen if the error was on the first skb. Move it to the right place in the loop. Signed-off-by: Zoltan Kiss Reported-by: Armin Zentai Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller --- drivers/net/xen-netback/netback.c | 37 +++++++++++++++++-------------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 1844a47636b6..a773f2016bad 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1030,10 +1030,16 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, { struct gnttab_map_grant_ref *gop_map = *gopp_map; u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx; + /* This always points to the shinfo of the skb being checked, which + * could be either the first or the one on the frag_list + */ struct skb_shared_info *shinfo = skb_shinfo(skb); + /* If this is non-NULL, we are currently checking the frag_list skb, and + * this points to the shinfo of the first one + */ + struct skb_shared_info *first_shinfo = NULL; int nr_frags = shinfo->nr_frags; int i, err; - struct sk_buff *first_skb = NULL; /* Check status of header. */ err = (*gopp_copy)->status; @@ -1086,31 +1092,28 @@ check_frags: xenvif_idx_unmap(queue, pending_idx); } + /* And if we found the error while checking the frag_list, unmap + * the first skb's frags + */ + if (first_shinfo) { + for (j = 0; j < first_shinfo->nr_frags; j++) { + pending_idx = frag_get_pending_idx(&first_shinfo->frags[j]); + xenvif_idx_unmap(queue, pending_idx); + } + } + /* Remember the error: invalidate all subsequent fragments. */ err = newerr; } - if (skb_has_frag_list(skb)) { - first_skb = skb; - skb = shinfo->frag_list; - shinfo = skb_shinfo(skb); + if (skb_has_frag_list(skb) && !first_shinfo) { + first_shinfo = skb_shinfo(skb); + shinfo = skb_shinfo(skb_shinfo(skb)->frag_list); nr_frags = shinfo->nr_frags; goto check_frags; } - /* There was a mapping error in the frag_list skb. We have to unmap - * the first skb's frags - */ - if (first_skb && err) { - int j; - shinfo = skb_shinfo(first_skb); - for (j = 0; j < shinfo->nr_frags; j++) { - pending_idx = frag_get_pending_idx(&shinfo->frags[j]); - xenvif_idx_unmap(queue, pending_idx); - } - } - *gopp_map = gop_map; return err; } From b42cc6e421e7bf74e545483aa34b99d2a2ca6d3a Mon Sep 17 00:00:00 2001 From: Zoltan Kiss Date: Fri, 18 Jul 2014 19:08:03 +0100 Subject: [PATCH 321/455] xen-netback: Fix releasing frag_list skbs in error path When the grant operations failed, the skb is freed up eventually, and it tries to release the frags, if there is any. For the main skb nr_frags is set to 0 to avoid this, but on the frag_list it iterates through the frags array, and tries to call put_page on the page pointer which contains garbage at that time. Signed-off-by: Zoltan Kiss Reported-by: Armin Zentai Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller --- drivers/net/xen-netback/netback.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index a773f2016bad..8cbf60d4689e 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1521,7 +1521,16 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) /* Check the remap error code. */ if (unlikely(xenvif_tx_check_gop(queue, skb, &gop_map, &gop_copy))) { + /* If there was an error, xenvif_tx_check_gop is + * expected to release all the frags which were mapped, + * so kfree_skb shouldn't do it again + */ skb_shinfo(skb)->nr_frags = 0; + if (skb_has_frag_list(skb)) { + struct sk_buff *nskb = + skb_shinfo(skb)->frag_list; + skb_shinfo(nskb)->nr_frags = 0; + } kfree_skb(skb); continue; } From 1b860da0404a76af8533099ffe0a965490939369 Mon Sep 17 00:00:00 2001 From: Zoltan Kiss Date: Fri, 18 Jul 2014 19:08:04 +0100 Subject: [PATCH 322/455] xen-netback: Fix releasing header slot on error path This patch makes this function aware that the first frag and the header might share the same ring slot. That could happen if the first slot is bigger than PKT_PROT_LEN. Due to this the error path might release that slot twice or never, depending on the error scenario. xenvif_idx_release is also removed from xenvif_idx_unmap, and called separately. Signed-off-by: Zoltan Kiss Reported-by: Armin Zentai Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller --- drivers/net/xen-netback/netback.c | 38 +++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 8cbf60d4689e..6fff911dc134 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1039,6 +1039,8 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, */ struct skb_shared_info *first_shinfo = NULL; int nr_frags = shinfo->nr_frags; + const bool sharedslot = nr_frags && + frag_get_pending_idx(&shinfo->frags[0]) == pending_idx; int i, err; /* Check status of header. */ @@ -1051,7 +1053,10 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, (*gopp_copy)->status, pending_idx, (*gopp_copy)->source.u.ref); - xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR); + /* The first frag might still have this slot mapped */ + if (!sharedslot) + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_ERROR); } check_frags: @@ -1068,8 +1073,19 @@ check_frags: pending_idx, gop_map->handle); /* Had a previous error? Invalidate this fragment. */ - if (unlikely(err)) + if (unlikely(err)) { xenvif_idx_unmap(queue, pending_idx); + /* If the mapping of the first frag was OK, but + * the header's copy failed, and they are + * sharing a slot, send an error + */ + if (i == 0 && sharedslot) + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_ERROR); + else + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_OKAY); + } continue; } @@ -1081,15 +1097,27 @@ check_frags: gop_map->status, pending_idx, gop_map->ref); + xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR); /* Not the first error? Preceding frags already invalidated. */ if (err) continue; - /* First error: invalidate preceding fragments. */ + + /* First error: if the header haven't shared a slot with the + * first frag, release it as well. + */ + if (!sharedslot) + xenvif_idx_release(queue, + XENVIF_TX_CB(skb)->pending_idx, + XEN_NETIF_RSP_OKAY); + + /* Invalidate preceding fragments of this skb. */ for (j = 0; j < i; j++) { pending_idx = frag_get_pending_idx(&shinfo->frags[j]); xenvif_idx_unmap(queue, pending_idx); + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_OKAY); } /* And if we found the error while checking the frag_list, unmap @@ -1099,6 +1127,8 @@ check_frags: for (j = 0; j < first_shinfo->nr_frags; j++) { pending_idx = frag_get_pending_idx(&first_shinfo->frags[j]); xenvif_idx_unmap(queue, pending_idx); + xenvif_idx_release(queue, pending_idx, + XEN_NETIF_RSP_OKAY); } } @@ -1834,8 +1864,6 @@ void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx) tx_unmap_op.status); BUG(); } - - xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_OKAY); } static inline int rx_work_todo(struct xenvif_queue *queue) From d8cfbfc4660054150ca1b7c501a8edc0771022f9 Mon Sep 17 00:00:00 2001 From: Zoltan Kiss Date: Fri, 18 Jul 2014 19:08:05 +0100 Subject: [PATCH 323/455] xen-netback: Fix pointer incrementation to avoid incorrect logging Due to this pointer is increased prematurely, the error log contains rubbish. Signed-off-by: Zoltan Kiss Reported-by: Armin Zentai Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller --- drivers/net/xen-netback/netback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 6fff911dc134..c65b636bcab9 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1045,7 +1045,6 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, /* Check status of header. */ err = (*gopp_copy)->status; - (*gopp_copy)++; if (unlikely(err)) { if (net_ratelimit()) netdev_dbg(queue->vif->dev, @@ -1058,6 +1057,7 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue, xenvif_idx_release(queue, pending_idx, XEN_NETIF_RSP_ERROR); } + (*gopp_copy)++; check_frags: for (i = 0; i < nr_frags; i++, gop_map++) { From 9a3c4145af32125c5ee39c0272662b47307a8323 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 20 Jul 2014 21:04:16 -0700 Subject: [PATCH 324/455] Linux 3.16-rc6 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index f3c543df4697..6b2774145d66 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 16 SUBLEVEL = 0 -EXTRAVERSION = -rc5 +EXTRAVERSION = -rc6 NAME = Shuffling Zombie Juror # *DOCUMENTATION* From 640d7efe4c08f06c4ae5d31b79bd8740e7f6790a Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Mon, 21 Jul 2014 00:06:48 +0100 Subject: [PATCH 325/455] dns_resolver: Null-terminate the right string *_result[len] is parsed as *(_result[len]) which is not at all what we want to touch here. Signed-off-by: Ben Hutchings Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller --- net/dns_resolver/dns_query.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/dns_resolver/dns_query.c b/net/dns_resolver/dns_query.c index 9acec61f5433..dd8696a3dbec 100644 --- a/net/dns_resolver/dns_query.c +++ b/net/dns_resolver/dns_query.c @@ -150,7 +150,7 @@ int dns_query(const char *type, const char *name, size_t namelen, goto put; memcpy(*_result, upayload->data, len); - *_result[len] = '\0'; + (*_result)[len] = '\0'; if (_expiry) *_expiry = rkey->expiry; From d46b6bfa7628030a93e05f7087b7c638a85b4a35 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Mon, 23 Jun 2014 15:55:36 +0200 Subject: [PATCH 326/455] batman-adv: drop QinQ claim frames in bridge loop avoidance Since bridge loop avoidance only supports untagged or simple 802.1q tagged VLAN claim frames, claim frames with stacked VLAN headers (QinQ) should be detected and dropped. Transporting the over the mesh may cause problems on the receivers, or create bogus entries in the local tt tables. Reported-by: Antonio Quartulli Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli --- net/batman-adv/bridge_loop_avoidance.c | 44 ++++++++++++++++++++------ 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c index 6f0d9ec37950..a957c8140721 100644 --- a/net/batman-adv/bridge_loop_avoidance.c +++ b/net/batman-adv/bridge_loop_avoidance.c @@ -800,11 +800,6 @@ static int batadv_check_claim_group(struct batadv_priv *bat_priv, bla_dst = (struct batadv_bla_claim_dst *)hw_dst; bla_dst_own = &bat_priv->bla.claim_dest; - /* check if it is a claim packet in general */ - if (memcmp(bla_dst->magic, bla_dst_own->magic, - sizeof(bla_dst->magic)) != 0) - return 0; - /* if announcement packet, use the source, * otherwise assume it is in the hw_src */ @@ -866,12 +861,13 @@ static int batadv_bla_process_claim(struct batadv_priv *bat_priv, struct batadv_hard_iface *primary_if, struct sk_buff *skb) { - struct batadv_bla_claim_dst *bla_dst; + struct batadv_bla_claim_dst *bla_dst, *bla_dst_own; uint8_t *hw_src, *hw_dst; - struct vlan_ethhdr *vhdr; + struct vlan_hdr *vhdr, vhdr_buf; struct ethhdr *ethhdr; struct arphdr *arphdr; unsigned short vid; + int vlan_depth = 0; __be16 proto; int headlen; int ret; @@ -882,9 +878,24 @@ static int batadv_bla_process_claim(struct batadv_priv *bat_priv, proto = ethhdr->h_proto; headlen = ETH_HLEN; if (vid & BATADV_VLAN_HAS_TAG) { - vhdr = vlan_eth_hdr(skb); - proto = vhdr->h_vlan_encapsulated_proto; - headlen += VLAN_HLEN; + /* Traverse the VLAN/Ethertypes. + * + * At this point it is known that the first protocol is a VLAN + * header, so start checking at the encapsulated protocol. + * + * The depth of the VLAN headers is recorded to drop BLA claim + * frames encapsulated into multiple VLAN headers (QinQ). + */ + do { + vhdr = skb_header_pointer(skb, headlen, VLAN_HLEN, + &vhdr_buf); + if (!vhdr) + return 0; + + proto = vhdr->h_vlan_encapsulated_proto; + headlen += VLAN_HLEN; + vlan_depth++; + } while (proto == htons(ETH_P_8021Q)); } if (proto != htons(ETH_P_ARP)) @@ -914,6 +925,19 @@ static int batadv_bla_process_claim(struct batadv_priv *bat_priv, hw_src = (uint8_t *)arphdr + sizeof(struct arphdr); hw_dst = hw_src + ETH_ALEN + 4; bla_dst = (struct batadv_bla_claim_dst *)hw_dst; + bla_dst_own = &bat_priv->bla.claim_dest; + + /* check if it is a claim frame in general */ + if (memcmp(bla_dst->magic, bla_dst_own->magic, + sizeof(bla_dst->magic)) != 0) + return 0; + + /* check if there is a claim frame encapsulated deeper in (QinQ) and + * drop that, as this is not supported by BLA but should also not be + * sent via the mesh. + */ + if (vlan_depth > 1) + return 1; /* check if it is a claim frame. */ ret = batadv_check_claim_group(bat_priv, primary_if, hw_src, hw_dst, From 4601879419f94a89fcbf427b4d3bfbf4ce294174 Mon Sep 17 00:00:00 2001 From: Eliad Peller Date: Wed, 16 Jul 2014 13:24:36 +0300 Subject: [PATCH 327/455] iwlwifi: mvm: pass beacons from foreign APs In AP mode, configure the fw to pass beacons from foreign APs, in order to be able to set the ht protection IE properly. Add the same filters in case of GO (which didn't have any configured filter_flags, probably by mistake) Signed-off-by: Eliad Peller Signed-off-by: Emmanuel Grumbach --- drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c index 725ba49576bf..8b79081d4885 100644 --- a/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c +++ b/drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c @@ -1072,8 +1072,12 @@ static int iwl_mvm_mac_ctxt_cmd_ap(struct iwl_mvm *mvm, /* Fill the common data for all mac context types */ iwl_mvm_mac_ctxt_cmd_common(mvm, vif, &cmd, action); - /* Also enable probe requests to pass */ - cmd.filter_flags |= cpu_to_le32(MAC_FILTER_IN_PROBE_REQUEST); + /* + * pass probe requests and beacons from other APs (needed + * for ht protection) + */ + cmd.filter_flags |= cpu_to_le32(MAC_FILTER_IN_PROBE_REQUEST | + MAC_FILTER_IN_BEACON); /* Fill the data specific for ap mode */ iwl_mvm_mac_ctxt_cmd_fill_ap(mvm, vif, &cmd.ap, @@ -1094,6 +1098,13 @@ static int iwl_mvm_mac_ctxt_cmd_go(struct iwl_mvm *mvm, /* Fill the common data for all mac context types */ iwl_mvm_mac_ctxt_cmd_common(mvm, vif, &cmd, action); + /* + * pass probe requests and beacons from other APs (needed + * for ht protection) + */ + cmd.filter_flags |= cpu_to_le32(MAC_FILTER_IN_PROBE_REQUEST | + MAC_FILTER_IN_BEACON); + /* Fill the data specific for GO mode */ iwl_mvm_mac_ctxt_cmd_fill_ap(mvm, vif, &cmd.go.ap, action == FW_CTXT_ACTION_ADD); From 35df3b298fc8779f7edf4b0228c683f7e98edcd5 Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Thu, 8 May 2014 17:13:15 +0200 Subject: [PATCH 328/455] batman-adv: fix TT VLAN inconsistency on VLAN re-add When a VLAN interface (on top of batX) is removed and re-added within a short timeframe TT does not have enough time to properly cleanup. This creates an internal TT state mismatch as the newly created softif_vlan will be initialized from scratch with a TT client count of zero (even if TT entries for this VLAN still exist). The resulting TT messages are bogus due to the counter / tt client listing mismatch, thus creating inconsistencies on every node in the network To fix this issue destroy_vlan() has to not free the VLAN object immediately but it has to be kept alive until all the TT entries for this VLAN have been removed. destroy_vlan() still removes the sysfs folder so that the user has the feeling that everything went fine. If the same VLAN is re-added before the old object is free'd, then the latter is resurrected and re-used. Implement such behaviour by increasing the reference counter of a softif_vlan object every time a new local TT entry for such VLAN is created and remove the object from the list only when all the TT entries have been destroyed. Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner --- net/batman-adv/soft-interface.c | 60 +++++++++++++++++++++++------- net/batman-adv/translation-table.c | 26 +++++++++++++ net/batman-adv/types.h | 2 + 3 files changed, 74 insertions(+), 14 deletions(-) diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index e7ee65dc20bf..cbd677f48c00 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -448,10 +448,15 @@ out: * possibly free it * @softif_vlan: the vlan object to release */ -void batadv_softif_vlan_free_ref(struct batadv_softif_vlan *softif_vlan) +void batadv_softif_vlan_free_ref(struct batadv_softif_vlan *vlan) { - if (atomic_dec_and_test(&softif_vlan->refcount)) - kfree_rcu(softif_vlan, rcu); + if (atomic_dec_and_test(&vlan->refcount)) { + spin_lock_bh(&vlan->bat_priv->softif_vlan_list_lock); + hlist_del_rcu(&vlan->list); + spin_unlock_bh(&vlan->bat_priv->softif_vlan_list_lock); + + kfree_rcu(vlan, rcu); + } } /** @@ -505,6 +510,7 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) if (!vlan) return -ENOMEM; + vlan->bat_priv = bat_priv; vlan->vid = vid; atomic_set(&vlan->refcount, 1); @@ -516,6 +522,10 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) return err; } + spin_lock_bh(&bat_priv->softif_vlan_list_lock); + hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); + spin_unlock_bh(&bat_priv->softif_vlan_list_lock); + /* add a new TT local entry. This one will be marked with the NOPURGE * flag */ @@ -523,10 +533,6 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) bat_priv->soft_iface->dev_addr, vid, BATADV_NULL_IFINDEX, BATADV_NO_MARK); - spin_lock_bh(&bat_priv->softif_vlan_list_lock); - hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); - spin_unlock_bh(&bat_priv->softif_vlan_list_lock); - return 0; } @@ -538,18 +544,13 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) static void batadv_softif_destroy_vlan(struct batadv_priv *bat_priv, struct batadv_softif_vlan *vlan) { - spin_lock_bh(&bat_priv->softif_vlan_list_lock); - hlist_del_rcu(&vlan->list); - spin_unlock_bh(&bat_priv->softif_vlan_list_lock); - - batadv_sysfs_del_vlan(bat_priv, vlan); - /* explicitly remove the associated TT local entry because it is marked * with the NOPURGE flag */ batadv_tt_local_remove(bat_priv, bat_priv->soft_iface->dev_addr, vlan->vid, "vlan interface destroyed", false); + batadv_sysfs_del_vlan(bat_priv, vlan); batadv_softif_vlan_free_ref(vlan); } @@ -567,6 +568,8 @@ static int batadv_interface_add_vid(struct net_device *dev, __be16 proto, unsigned short vid) { struct batadv_priv *bat_priv = netdev_priv(dev); + struct batadv_softif_vlan *vlan; + int ret; /* only 802.1Q vlans are supported. * batman-adv does not know how to handle other types @@ -576,7 +579,36 @@ static int batadv_interface_add_vid(struct net_device *dev, __be16 proto, vid |= BATADV_VLAN_HAS_TAG; - return batadv_softif_create_vlan(bat_priv, vid); + /* if a new vlan is getting created and it already exists, it means that + * it was not deleted yet. batadv_softif_vlan_get() increases the + * refcount in order to revive the object. + * + * if it does not exist then create it. + */ + vlan = batadv_softif_vlan_get(bat_priv, vid); + if (!vlan) + return batadv_softif_create_vlan(bat_priv, vid); + + /* recreate the sysfs object if it was already destroyed (and it should + * be since we received a kill_vid() for this vlan + */ + if (!vlan->kobj) { + ret = batadv_sysfs_add_vlan(bat_priv->soft_iface, vlan); + if (ret) { + batadv_softif_vlan_free_ref(vlan); + return ret; + } + } + + /* add a new TT local entry. This one will be marked with the NOPURGE + * flag. This must be added again, even if the vlan object already + * exists, because the entry was deleted by kill_vid() + */ + batadv_tt_local_add(bat_priv->soft_iface, + bat_priv->soft_iface->dev_addr, vid, + BATADV_NULL_IFINDEX, BATADV_NO_MARK); + + return 0; } /** diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index d636bde72c9a..5f59e7f899a0 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -511,6 +511,7 @@ bool batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, struct batadv_priv *bat_priv = netdev_priv(soft_iface); struct batadv_tt_local_entry *tt_local; struct batadv_tt_global_entry *tt_global = NULL; + struct batadv_softif_vlan *vlan; struct net_device *in_dev = NULL; struct hlist_head *head; struct batadv_tt_orig_list_entry *orig_entry; @@ -572,6 +573,9 @@ bool batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, if (!tt_local) goto out; + /* increase the refcounter of the related vlan */ + vlan = batadv_softif_vlan_get(bat_priv, vid); + batadv_dbg(BATADV_DBG_TT, bat_priv, "Creating new local tt entry: %pM (vid: %d, ttvn: %d)\n", addr, BATADV_PRINT_VID(vid), @@ -604,6 +608,7 @@ bool batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, if (unlikely(hash_added != 0)) { /* remove the reference for the hash */ batadv_tt_local_entry_free_ref(tt_local); + batadv_softif_vlan_free_ref(vlan); goto out; } @@ -1009,6 +1014,7 @@ uint16_t batadv_tt_local_remove(struct batadv_priv *bat_priv, { struct batadv_tt_local_entry *tt_local_entry; uint16_t flags, curr_flags = BATADV_NO_FLAGS; + struct batadv_softif_vlan *vlan; tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr, vid); if (!tt_local_entry) @@ -1039,6 +1045,11 @@ uint16_t batadv_tt_local_remove(struct batadv_priv *bat_priv, hlist_del_rcu(&tt_local_entry->common.hash_entry); batadv_tt_local_entry_free_ref(tt_local_entry); + /* decrease the reference held for this vlan */ + vlan = batadv_softif_vlan_get(bat_priv, vid); + batadv_softif_vlan_free_ref(vlan); + batadv_softif_vlan_free_ref(vlan); + out: if (tt_local_entry) batadv_tt_local_entry_free_ref(tt_local_entry); @@ -1111,6 +1122,7 @@ static void batadv_tt_local_table_free(struct batadv_priv *bat_priv) spinlock_t *list_lock; /* protects write access to the hash lists */ struct batadv_tt_common_entry *tt_common_entry; struct batadv_tt_local_entry *tt_local; + struct batadv_softif_vlan *vlan; struct hlist_node *node_tmp; struct hlist_head *head; uint32_t i; @@ -1131,6 +1143,13 @@ static void batadv_tt_local_table_free(struct batadv_priv *bat_priv) tt_local = container_of(tt_common_entry, struct batadv_tt_local_entry, common); + + /* decrease the reference held for this vlan */ + vlan = batadv_softif_vlan_get(bat_priv, + tt_common_entry->vid); + batadv_softif_vlan_free_ref(vlan); + batadv_softif_vlan_free_ref(vlan); + batadv_tt_local_entry_free_ref(tt_local); } spin_unlock_bh(list_lock); @@ -3139,6 +3158,7 @@ static void batadv_tt_local_purge_pending_clients(struct batadv_priv *bat_priv) struct batadv_hashtable *hash = bat_priv->tt.local_hash; struct batadv_tt_common_entry *tt_common; struct batadv_tt_local_entry *tt_local; + struct batadv_softif_vlan *vlan; struct hlist_node *node_tmp; struct hlist_head *head; spinlock_t *list_lock; /* protects write access to the hash lists */ @@ -3167,6 +3187,12 @@ static void batadv_tt_local_purge_pending_clients(struct batadv_priv *bat_priv) tt_local = container_of(tt_common, struct batadv_tt_local_entry, common); + + /* decrease the reference held for this vlan */ + vlan = batadv_softif_vlan_get(bat_priv, tt_common->vid); + batadv_softif_vlan_free_ref(vlan); + batadv_softif_vlan_free_ref(vlan); + batadv_tt_local_entry_free_ref(tt_local); } spin_unlock_bh(list_lock); diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 34891a56773f..8854c05622a9 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -687,6 +687,7 @@ struct batadv_priv_nc { /** * struct batadv_softif_vlan - per VLAN attributes set + * @bat_priv: pointer to the mesh object * @vid: VLAN identifier * @kobj: kobject for sysfs vlan subdirectory * @ap_isolation: AP isolation state @@ -696,6 +697,7 @@ struct batadv_priv_nc { * @rcu: struct used for freeing in a RCU-safe manner */ struct batadv_softif_vlan { + struct batadv_priv *bat_priv; unsigned short vid; struct kobject *kobj; atomic_t ap_isolation; /* boolean */ From 58d4e21e50ff3cc57910a8abc20d7e14375d2f61 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Fri, 18 Jul 2014 11:43:01 -0700 Subject: [PATCH 329/455] tracing: Fix wraparound problems in "uptime" trace clock The "uptime" trace clock added in: commit 8aacf017b065a805d27467843490c976835eb4a5 tracing: Add "uptime" trace clock that uses jiffies has wraparound problems when the system has been up more than 1 hour 11 minutes and 34 seconds. It converts jiffies to nanoseconds using: (u64)jiffies_to_usecs(jiffy) * 1000ULL but since jiffies_to_usecs() only returns a 32-bit value, it truncates at 2^32 microseconds. An additional problem on 32-bit systems is that the argument is "unsigned long", so fixing the return value only helps until 2^32 jiffies (49.7 days on a HZ=1000 system). Avoid these problems by using jiffies_64 as our basis, and not converting to nanoseconds (we do convert to clock_t because user facing API must not be dependent on internal kernel HZ values). Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com Cc: stable@vger.kernel.org # 3.10+ Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies" Signed-off-by: Tony Luck Signed-off-by: Steven Rostedt --- kernel/trace/trace.c | 2 +- kernel/trace/trace_clock.c | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index bda9621638cc..291397e66669 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -823,7 +823,7 @@ static struct { { trace_clock_local, "local", 1 }, { trace_clock_global, "global", 1 }, { trace_clock_counter, "counter", 0 }, - { trace_clock_jiffies, "uptime", 1 }, + { trace_clock_jiffies, "uptime", 0 }, { trace_clock, "perf", 1 }, ARCH_TRACE_CLOCKS }; diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c index 26dc348332b7..57b67b1f24d1 100644 --- a/kernel/trace/trace_clock.c +++ b/kernel/trace/trace_clock.c @@ -59,13 +59,14 @@ u64 notrace trace_clock(void) /* * trace_jiffy_clock(): Simply use jiffies as a clock counter. + * Note that this use of jiffies_64 is not completely safe on + * 32-bit systems. But the window is tiny, and the effect if + * we are affected is that we will have an obviously bogus + * timestamp on a trace event - i.e. not life threatening. */ u64 notrace trace_clock_jiffies(void) { - u64 jiffy = jiffies - INITIAL_JIFFIES; - - /* Return nsecs */ - return (u64)jiffies_to_usecs(jiffy) * 1000ULL; + return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES); } /* From 51cbe7e7c400def749950ab6b2c120624dbe21a7 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 20 Jun 2014 23:16:45 +0200 Subject: [PATCH 330/455] x86, MCE: Robustify mcheck_init_device BorisO reports that misc_register() fails often on xen. The current code unregisters the CPU hotplug notifier in that case. If then a CPU is offlined and onlined back again, we end up with a second timer running on that CPU, leading to soft lockups and system hangs. So let's leave the hotcpu notifier always registered - even if mce_device_create failed for some cores and never unreg it so that we can deal with the timer handling accordingly. Reported-and-Tested-by: Boris Ostrovsky Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov --- arch/x86/kernel/cpu/mcheck/mce.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index bb92f38153b2..9a79c8dbd8e8 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2451,6 +2451,12 @@ static __init int mcheck_init_device(void) for_each_online_cpu(i) { err = mce_device_create(i); if (err) { + /* + * Register notifier anyway (and do not unreg it) so + * that we don't leave undeleted timers, see notifier + * callback above. + */ + __register_hotcpu_notifier(&mce_cpu_notifier); cpu_notifier_register_done(); goto err_device_create; } @@ -2471,10 +2477,6 @@ static __init int mcheck_init_device(void) err_register: unregister_syscore_ops(&mce_syscore_ops); - cpu_notifier_register_begin(); - __unregister_hotcpu_notifier(&mce_cpu_notifier); - cpu_notifier_register_done(); - err_device_create: /* * We didn't keep track of which devices were created above, but From 20b2656d7e644c8673f2b9944a0e65249e0ae555 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Fri, 18 Jul 2014 13:56:56 +0200 Subject: [PATCH 331/455] drm/radeon: let's use GB for vm_size (v2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit VM sizes smaller than 1GB doesn't make much sense anyway. v2: fix typo and grammer Signed-off-by: Christian König Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_device.c | 22 +++++++++++----------- drivers/gpu/drm/radeon/radeon_drv.c | 4 ++-- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 03686fab842d..697add2cd4e3 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1056,36 +1056,36 @@ static void radeon_check_arguments(struct radeon_device *rdev) if (!radeon_check_pot_argument(radeon_vm_size)) { dev_warn(rdev->dev, "VM size (%d) must be a power of 2\n", radeon_vm_size); - radeon_vm_size = 4096; + radeon_vm_size = 4; } - if (radeon_vm_size < 4) { - dev_warn(rdev->dev, "VM size (%d) to small, min is 4MB\n", + if (radeon_vm_size < 1) { + dev_warn(rdev->dev, "VM size (%d) to small, min is 1GB\n", radeon_vm_size); - radeon_vm_size = 4096; + radeon_vm_size = 4; } /* * Max GPUVM size for Cayman, SI and CI are 40 bits. */ - if (radeon_vm_size > 1024*1024) { - dev_warn(rdev->dev, "VM size (%d) to large, max is 1TB\n", + if (radeon_vm_size > 1024) { + dev_warn(rdev->dev, "VM size (%d) too large, max is 1TB\n", radeon_vm_size); - radeon_vm_size = 4096; + radeon_vm_size = 4; } /* defines number of bits in page table versus page directory, * a page is 4KB so we have 12 bits offset, minimum 9 bits in the * page table and the remaining bits are in the page directory */ if (radeon_vm_block_size < 9) { - dev_warn(rdev->dev, "VM page table size (%d) to small\n", + dev_warn(rdev->dev, "VM page table size (%d) too small\n", radeon_vm_block_size); radeon_vm_block_size = 9; } if (radeon_vm_block_size > 24 || - radeon_vm_size < (1ull << radeon_vm_block_size)) { - dev_warn(rdev->dev, "VM page table size (%d) to large\n", + (radeon_vm_size * 1024) < (1ull << radeon_vm_block_size)) { + dev_warn(rdev->dev, "VM page table size (%d) too large\n", radeon_vm_block_size); radeon_vm_block_size = 9; } @@ -1238,7 +1238,7 @@ int radeon_device_init(struct radeon_device *rdev, /* Adjust VM size here. * Max GPUVM size for cayman+ is 40 bits. */ - rdev->vm_manager.max_pfn = radeon_vm_size << 8; + rdev->vm_manager.max_pfn = radeon_vm_size << 18; /* Set asic functions */ r = radeon_asic_init(rdev); diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index cb1421369e3a..e9e361084249 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -173,7 +173,7 @@ int radeon_dpm = -1; int radeon_aspm = -1; int radeon_runtime_pm = -1; int radeon_hard_reset = 0; -int radeon_vm_size = 4096; +int radeon_vm_size = 4; int radeon_vm_block_size = 9; int radeon_deep_color = 0; @@ -243,7 +243,7 @@ module_param_named(runpm, radeon_runtime_pm, int, 0444); MODULE_PARM_DESC(hard_reset, "PCI config reset (1 = force enable, 0 = disable (default))"); module_param_named(hard_reset, radeon_hard_reset, int, 0444); -MODULE_PARM_DESC(vm_size, "VM address space size in megabytes (default 4GB)"); +MODULE_PARM_DESC(vm_size, "VM address space size in gigabytes (default 4GB)"); module_param_named(vm_size, radeon_vm_size, int, 0444); MODULE_PARM_DESC(vm_block_size, "VM page table size in bits (default 9)"); From 036bf46a3962c87fc6ab5e6dbc65f469730b4cf0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Fri, 18 Jul 2014 08:56:40 +0200 Subject: [PATCH 332/455] drm/radeon: fix handling of radeon_vm_bo_rmv v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit v3: completely rewritten. We now just remember which areas of the PT to clear and do so on the next command submission. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=79980 Signed-off-by: Christian König Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon.h | 13 +++-- drivers/gpu/drm/radeon/radeon_cs.c | 22 ++++++-- drivers/gpu/drm/radeon/radeon_vm.c | 84 +++++++++++++++++++++--------- 3 files changed, 87 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index b7204500a9a6..3d5e1a93b947 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -449,6 +449,7 @@ struct radeon_bo_va { /* protected by vm mutex */ struct list_head vm_list; + struct list_head vm_status; /* constant after initialization */ struct radeon_vm *vm; @@ -867,6 +868,9 @@ struct radeon_vm { struct list_head va; unsigned id; + /* BOs freed, but not yet updated in the PT */ + struct list_head freed; + /* contains the page directory */ struct radeon_bo *page_directory; uint64_t pd_gpu_addr; @@ -2832,9 +2836,10 @@ void radeon_vm_fence(struct radeon_device *rdev, uint64_t radeon_vm_map_gart(struct radeon_device *rdev, uint64_t addr); int radeon_vm_update_page_directory(struct radeon_device *rdev, struct radeon_vm *vm); +int radeon_vm_clear_freed(struct radeon_device *rdev, + struct radeon_vm *vm); int radeon_vm_bo_update(struct radeon_device *rdev, - struct radeon_vm *vm, - struct radeon_bo *bo, + struct radeon_bo_va *bo_va, struct ttm_mem_reg *mem); void radeon_vm_bo_invalidate(struct radeon_device *rdev, struct radeon_bo *bo); @@ -2847,8 +2852,8 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev, struct radeon_bo_va *bo_va, uint64_t offset, uint32_t flags); -int radeon_vm_bo_rmv(struct radeon_device *rdev, - struct radeon_bo_va *bo_va); +void radeon_vm_bo_rmv(struct radeon_device *rdev, + struct radeon_bo_va *bo_va); /* audio */ void r600_audio_update_hdmi(struct work_struct *work); diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 71a143461478..09fcf4dcdfdb 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -461,14 +461,24 @@ static int radeon_bo_vm_update_pte(struct radeon_cs_parser *p, struct radeon_vm *vm) { struct radeon_device *rdev = p->rdev; + struct radeon_bo_va *bo_va; int i, r; r = radeon_vm_update_page_directory(rdev, vm); if (r) return r; - r = radeon_vm_bo_update(rdev, vm, rdev->ring_tmp_bo.bo, - &rdev->ring_tmp_bo.bo->tbo.mem); + r = radeon_vm_clear_freed(rdev, vm); + if (r) + return r; + + bo_va = radeon_vm_bo_find(vm, rdev->ring_tmp_bo.bo); + if (bo_va == NULL) { + DRM_ERROR("Tmp BO not in VM!\n"); + return -EINVAL; + } + + r = radeon_vm_bo_update(rdev, bo_va, &rdev->ring_tmp_bo.bo->tbo.mem); if (r) return r; @@ -480,7 +490,13 @@ static int radeon_bo_vm_update_pte(struct radeon_cs_parser *p, continue; bo = p->relocs[i].robj; - r = radeon_vm_bo_update(rdev, vm, bo, &bo->tbo.mem); + bo_va = radeon_vm_bo_find(vm, bo); + if (bo_va == NULL) { + dev_err(rdev->dev, "bo %p not in vm %p\n", bo, vm); + return -EINVAL; + } + + r = radeon_vm_bo_update(rdev, bo_va, &bo->tbo.mem); if (r) return r; } diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c index eecff6bbd341..2726b46ac819 100644 --- a/drivers/gpu/drm/radeon/radeon_vm.c +++ b/drivers/gpu/drm/radeon/radeon_vm.c @@ -332,6 +332,7 @@ struct radeon_bo_va *radeon_vm_bo_add(struct radeon_device *rdev, bo_va->ref_count = 1; INIT_LIST_HEAD(&bo_va->bo_list); INIT_LIST_HEAD(&bo_va->vm_list); + INIT_LIST_HEAD(&bo_va->vm_status); mutex_lock(&vm->mutex); list_add(&bo_va->vm_list, &vm->va); @@ -468,6 +469,15 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev, head = &tmp->vm_list; } + if (bo_va->soffset) { + /* add a clone of the bo_va to clear the old address */ + tmp = kzalloc(sizeof(struct radeon_bo_va), GFP_KERNEL); + tmp->soffset = bo_va->soffset; + tmp->eoffset = bo_va->eoffset; + tmp->vm = vm; + list_add(&tmp->vm_status, &vm->freed); + } + bo_va->soffset = soffset; bo_va->eoffset = eoffset; bo_va->flags = flags; @@ -823,25 +833,19 @@ static void radeon_vm_update_ptes(struct radeon_device *rdev, * Object have to be reserved and mutex must be locked! */ int radeon_vm_bo_update(struct radeon_device *rdev, - struct radeon_vm *vm, - struct radeon_bo *bo, + struct radeon_bo_va *bo_va, struct ttm_mem_reg *mem) { + struct radeon_vm *vm = bo_va->vm; struct radeon_ib ib; - struct radeon_bo_va *bo_va; unsigned nptes, ndw; uint64_t addr; int r; - bo_va = radeon_vm_bo_find(vm, bo); - if (bo_va == NULL) { - dev_err(rdev->dev, "bo %p not in vm %p\n", bo, vm); - return -EINVAL; - } if (!bo_va->soffset) { dev_err(rdev->dev, "bo %p don't has a mapping in vm %p\n", - bo, vm); + bo_va->bo, vm); return -EINVAL; } @@ -868,7 +872,7 @@ int radeon_vm_bo_update(struct radeon_device *rdev, trace_radeon_vm_bo_update(bo_va); - nptes = radeon_bo_ngpu_pages(bo); + nptes = (bo_va->eoffset - bo_va->soffset) / RADEON_GPU_PAGE_SIZE; /* padding, etc. */ ndw = 64; @@ -910,6 +914,34 @@ int radeon_vm_bo_update(struct radeon_device *rdev, return 0; } +/** + * radeon_vm_clear_freed - clear freed BOs in the PT + * + * @rdev: radeon_device pointer + * @vm: requested vm + * + * Make sure all freed BOs are cleared in the PT. + * Returns 0 for success. + * + * PTs have to be reserved and mutex must be locked! + */ +int radeon_vm_clear_freed(struct radeon_device *rdev, + struct radeon_vm *vm) +{ + struct radeon_bo_va *bo_va, *tmp; + int r; + + list_for_each_entry_safe(bo_va, tmp, &vm->freed, vm_status) { + list_del(&bo_va->vm_status); + r = radeon_vm_bo_update(rdev, bo_va, NULL); + kfree(bo_va); + if (r) + return r; + } + return 0; + +} + /** * radeon_vm_bo_rmv - remove a bo to a specific vm * @@ -917,27 +949,27 @@ int radeon_vm_bo_update(struct radeon_device *rdev, * @bo_va: requested bo_va * * Remove @bo_va->bo from the requested vm (cayman+). - * Remove @bo_va->bo from the list of bos associated with the bo_va->vm and - * remove the ptes for @bo_va in the page table. - * Returns 0 for success. * * Object have to be reserved! */ -int radeon_vm_bo_rmv(struct radeon_device *rdev, - struct radeon_bo_va *bo_va) +void radeon_vm_bo_rmv(struct radeon_device *rdev, + struct radeon_bo_va *bo_va) { - int r = 0; + struct radeon_vm *vm = bo_va->vm; - mutex_lock(&bo_va->vm->mutex); - if (bo_va->soffset) - r = radeon_vm_bo_update(rdev, bo_va->vm, bo_va->bo, NULL); - - list_del(&bo_va->vm_list); - mutex_unlock(&bo_va->vm->mutex); list_del(&bo_va->bo_list); - kfree(bo_va); - return r; + mutex_lock(&vm->mutex); + list_del(&bo_va->vm_list); + + if (bo_va->soffset) { + bo_va->bo = NULL; + list_add(&bo_va->vm_status, &vm->freed); + } else { + kfree(bo_va); + } + + mutex_unlock(&vm->mutex); } /** @@ -980,6 +1012,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) vm->last_id_use = NULL; mutex_init(&vm->mutex); INIT_LIST_HEAD(&vm->va); + INIT_LIST_HEAD(&vm->freed); pd_size = radeon_vm_directory_size(rdev); pd_entries = radeon_vm_num_pdes(rdev); @@ -1034,7 +1067,8 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm) kfree(bo_va); } } - + list_for_each_entry_safe(bo_va, tmp, &vm->freed, vm_status) + kfree(bo_va); for (i = 0; i < radeon_vm_num_pdes(rdev); i++) radeon_bo_unref(&vm->page_tables[i].bo); From cc9e67e3d7000c1efbaf929c6bdaf78707407b3b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Fri, 18 Jul 2014 13:48:10 +0200 Subject: [PATCH 333/455] drm/radeon: fix VM IB handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Calling radeon_vm_bo_find on the IB BO during CS is illegal and can lead to an crash. Signed-off-by: Christian König Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon.h | 2 ++ drivers/gpu/drm/radeon/radeon_cs.c | 6 +++--- drivers/gpu/drm/radeon/radeon_kms.c | 26 +++++++++++++------------- drivers/gpu/drm/radeon/radeon_vm.c | 1 + 4 files changed, 19 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 3d5e1a93b947..60c47f829122 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -879,6 +879,8 @@ struct radeon_vm { /* array of page tables, one for each page directory entry */ struct radeon_vm_pt *page_tables; + struct radeon_bo_va *ib_bo_va; + struct mutex mutex; /* last fence for cs using this vm */ struct radeon_fence *fence; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 09fcf4dcdfdb..ae763f60c8a0 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -472,13 +472,13 @@ static int radeon_bo_vm_update_pte(struct radeon_cs_parser *p, if (r) return r; - bo_va = radeon_vm_bo_find(vm, rdev->ring_tmp_bo.bo); - if (bo_va == NULL) { + if (vm->ib_bo_va == NULL) { DRM_ERROR("Tmp BO not in VM!\n"); return -EINVAL; } - r = radeon_vm_bo_update(rdev, bo_va, &rdev->ring_tmp_bo.bo->tbo.mem); + r = radeon_vm_bo_update(rdev, vm->ib_bo_va, + &rdev->ring_tmp_bo.bo->tbo.mem); if (r) return r; diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index 35d931881b4b..d25ae6acfd5a 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -579,7 +579,7 @@ int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) /* new gpu have virtual address space support */ if (rdev->family >= CHIP_CAYMAN) { struct radeon_fpriv *fpriv; - struct radeon_bo_va *bo_va; + struct radeon_vm *vm; int r; fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL); @@ -587,7 +587,8 @@ int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) return -ENOMEM; } - r = radeon_vm_init(rdev, &fpriv->vm); + vm = &fpriv->vm; + r = radeon_vm_init(rdev, vm); if (r) { kfree(fpriv); return r; @@ -596,22 +597,23 @@ int radeon_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) if (rdev->accel_working) { r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); if (r) { - radeon_vm_fini(rdev, &fpriv->vm); + radeon_vm_fini(rdev, vm); kfree(fpriv); return r; } /* map the ib pool buffer read only into * virtual address space */ - bo_va = radeon_vm_bo_add(rdev, &fpriv->vm, - rdev->ring_tmp_bo.bo); - r = radeon_vm_bo_set_addr(rdev, bo_va, RADEON_VA_IB_OFFSET, + vm->ib_bo_va = radeon_vm_bo_add(rdev, vm, + rdev->ring_tmp_bo.bo); + r = radeon_vm_bo_set_addr(rdev, vm->ib_bo_va, + RADEON_VA_IB_OFFSET, RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED); radeon_bo_unreserve(rdev->ring_tmp_bo.bo); if (r) { - radeon_vm_fini(rdev, &fpriv->vm); + radeon_vm_fini(rdev, vm); kfree(fpriv); return r; } @@ -640,21 +642,19 @@ void radeon_driver_postclose_kms(struct drm_device *dev, /* new gpu have virtual address space support */ if (rdev->family >= CHIP_CAYMAN && file_priv->driver_priv) { struct radeon_fpriv *fpriv = file_priv->driver_priv; - struct radeon_bo_va *bo_va; + struct radeon_vm *vm = &fpriv->vm; int r; if (rdev->accel_working) { r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); if (!r) { - bo_va = radeon_vm_bo_find(&fpriv->vm, - rdev->ring_tmp_bo.bo); - if (bo_va) - radeon_vm_bo_rmv(rdev, bo_va); + if (vm->ib_bo_va) + radeon_vm_bo_rmv(rdev, vm->ib_bo_va); radeon_bo_unreserve(rdev->ring_tmp_bo.bo); } } - radeon_vm_fini(rdev, &fpriv->vm); + radeon_vm_fini(rdev, vm); kfree(fpriv); file_priv->driver_priv = NULL; } diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c index 2726b46ac819..fa41e0d7d17d 100644 --- a/drivers/gpu/drm/radeon/radeon_vm.c +++ b/drivers/gpu/drm/radeon/radeon_vm.c @@ -1007,6 +1007,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) int r; vm->id = 0; + vm->ib_bo_va = NULL; vm->fence = NULL; vm->last_flush = NULL; vm->last_id_use = NULL; From 730a336c33a3398d65896e8ee3ef9f5679fe30a9 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 21 Jul 2014 10:41:13 -0400 Subject: [PATCH 334/455] drm/radeon/TN: only enable bapm on MSI systems There still seem to be stability problems with other systems. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=72921 Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/trinity_dpm.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/trinity_dpm.c b/drivers/gpu/drm/radeon/trinity_dpm.c index 20da6ff183df..32e50be9c4ac 100644 --- a/drivers/gpu/drm/radeon/trinity_dpm.c +++ b/drivers/gpu/drm/radeon/trinity_dpm.c @@ -1874,15 +1874,16 @@ int trinity_dpm_init(struct radeon_device *rdev) for (i = 0; i < SUMO_MAX_HARDWARE_POWERLEVELS; i++) pi->at[i] = TRINITY_AT_DFLT; - /* There are stability issues reported on latops with - * bapm installed when switching between AC and battery - * power. At the same time, some desktop boards hang - * if it's not enabled and dpm is enabled. + /* There are stability issues reported on with + * bapm enabled when switching between AC and battery + * power. At the same time, some MSI boards hang + * if it's not enabled and dpm is enabled. Just enable + * it for MSI boards right now. */ - if (rdev->flags & RADEON_IS_MOBILITY) - pi->enable_bapm = false; - else + if (rdev->pdev->subsystem_vendor == 0x1462) pi->enable_bapm = true; + else + pi->enable_bapm = false; pi->enable_nbps_policy = true; pi->enable_sclk_ds = true; pi->enable_gfx_power_gating = true; From a0d036b074b4a5a933e37fcb9bdd6b3cc80a0387 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Sat, 19 Jul 2014 12:40:42 +0100 Subject: [PATCH 335/455] drm/i915: Reorder the semaphore deadlock check, again commit 4be173813e57c7298103a83155c2391b5b167b4c Author: Chris Wilson Date: Fri Jun 6 10:22:29 2014 +0100 drm/i915: Reorder semaphore deadlock check did the majority of the work, but it missed one crucial detail: The check for the unkickable deadlock on this ring must come after the check whether the ring that we are waiting on has already passed its target seqno. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80709 Tested-by: Stefan Huber Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Jani Nikula Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_irq.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 267f069765ad..c05c84f3f091 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2845,7 +2845,7 @@ static int semaphore_passed(struct intel_engine_cs *ring) { struct drm_i915_private *dev_priv = ring->dev->dev_private; struct intel_engine_cs *signaller; - u32 seqno, ctl; + u32 seqno; ring->hangcheck.deadlock++; @@ -2857,15 +2857,12 @@ static int semaphore_passed(struct intel_engine_cs *ring) if (signaller->hangcheck.deadlock >= I915_NUM_RINGS) return -1; - /* cursory check for an unkickable deadlock */ - ctl = I915_READ_CTL(signaller); - if (ctl & RING_WAIT_SEMAPHORE && semaphore_passed(signaller) < 0) - return -1; - if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno)) return 1; - if (signaller->hangcheck.deadlock) + /* cursory check for an unkickable deadlock */ + if (I915_READ_CTL(signaller) & RING_WAIT_SEMAPHORE && + semaphore_passed(signaller) < 0) return -1; return 0; From 92e396270fea0a787ea848880565fb14cfb20f18 Mon Sep 17 00:00:00 2001 From: Jes Sorensen Date: Mon, 21 Jul 2014 11:25:09 +0200 Subject: [PATCH 336/455] staging: rtl8723au: rtw_resume(): release semaphore before exit on error Signed-off-by: Jes Sorensen Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8723au/os_dep/usb_intf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/staging/rtl8723au/os_dep/usb_intf.c b/drivers/staging/rtl8723au/os_dep/usb_intf.c index 8b25c1aa2025..ebb19b22f47f 100644 --- a/drivers/staging/rtl8723au/os_dep/usb_intf.c +++ b/drivers/staging/rtl8723au/os_dep/usb_intf.c @@ -530,8 +530,10 @@ int rtw_resume_process23a(struct rtw_adapter *padapter) pwrpriv->bkeepfwalive = false; DBG_8723A("bkeepfwalive(%x)\n", pwrpriv->bkeepfwalive); - if (pm_netdev_open23a(pnetdev, true) != 0) + if (pm_netdev_open23a(pnetdev, true) != 0) { + up(&pwrpriv->lock); goto exit; + } netif_device_attach(pnetdev); netif_carrier_on(pnetdev); From 10ec9472f05b45c94db3c854d22581a20b97db41 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 21 Jul 2014 07:17:42 +0200 Subject: [PATCH 337/455] ipv4: fix buffer overflow in ip_options_compile() There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [] ip_options_compile+0x121/0x9c0 [28504.915394] [] ip_options_get_from_user+0xad/0x120 [28504.916843] [] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [] ip_setsockopt+0x30/0xa0 [28504.919490] [] tcp_setsockopt+0x5b/0x90 [28504.920835] [] sock_common_setsockopt+0x5f/0x70 [28504.922208] [] SyS_setsockopt+0xa2/0x140 [28504.923459] [] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [] ip_options_get_from_user+0x35/0x120 [28504.926884] [] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [] ip_setsockopt+0x30/0xa0 [28504.929175] [] tcp_setsockopt+0x5b/0x90 [28504.930400] [] sock_common_setsockopt+0x5f/0x70 [28504.931677] [] SyS_setsockopt+0xa2/0x140 [28504.932851] [] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv4/ip_options.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index 5e7aecea05cd..ad382499bace 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -288,6 +288,10 @@ int ip_options_compile(struct net *net, optptr++; continue; } + if (unlikely(l < 2)) { + pp_ptr = optptr; + goto error; + } optlen = optptr[1]; if (optlen < 2 || optlen > l) { pp_ptr = optptr; From 26053926feb1c16ade9c30bc7443bf28d829d08e Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Mon, 21 Jul 2014 22:27:56 -0700 Subject: [PATCH 338/455] sparc: Hook up renameat2 syscall. Signed-off-by: David S. Miller --- arch/sparc/include/uapi/asm/unistd.h | 3 ++- arch/sparc/kernel/sys32.S | 1 + arch/sparc/kernel/systbls_32.S | 1 + arch/sparc/kernel/systbls_64.S | 2 ++ 4 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h index b73274fb961a..42f2bca1d338 100644 --- a/arch/sparc/include/uapi/asm/unistd.h +++ b/arch/sparc/include/uapi/asm/unistd.h @@ -410,8 +410,9 @@ #define __NR_finit_module 342 #define __NR_sched_setattr 343 #define __NR_sched_getattr 344 +#define __NR_renameat2 345 -#define NR_syscalls 345 +#define NR_syscalls 346 /* Bitmask values returned from kern_features system call. */ #define KERN_FEATURE_MIXED_MODE_STACK 0x00000001 diff --git a/arch/sparc/kernel/sys32.S b/arch/sparc/kernel/sys32.S index d066eb18650c..f834224208ed 100644 --- a/arch/sparc/kernel/sys32.S +++ b/arch/sparc/kernel/sys32.S @@ -48,6 +48,7 @@ SIGN1(sys32_futex, compat_sys_futex, %o1) SIGN1(sys32_recvfrom, compat_sys_recvfrom, %o0) SIGN1(sys32_recvmsg, compat_sys_recvmsg, %o0) SIGN1(sys32_sendmsg, compat_sys_sendmsg, %o0) +SIGN2(sys32_renameat2, sys_renameat2, %o0, %o2) .globl sys32_mmap2 sys32_mmap2: diff --git a/arch/sparc/kernel/systbls_32.S b/arch/sparc/kernel/systbls_32.S index 151ace8766cc..85fe9b1087cd 100644 --- a/arch/sparc/kernel/systbls_32.S +++ b/arch/sparc/kernel/systbls_32.S @@ -86,3 +86,4 @@ sys_call_table: /*330*/ .long sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, sys_open_by_handle_at, sys_clock_adjtime /*335*/ .long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev /*340*/ .long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr +/*345*/ .long sys_renameat2 diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S index 4bd4e2bb26cf..33ecba2826ea 100644 --- a/arch/sparc/kernel/systbls_64.S +++ b/arch/sparc/kernel/systbls_64.S @@ -87,6 +87,7 @@ sys_call_table32: /*330*/ .word compat_sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, compat_sys_open_by_handle_at, compat_sys_clock_adjtime .word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev /*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr + .word sys32_renameat2 #endif /* CONFIG_COMPAT */ @@ -165,3 +166,4 @@ sys_call_table: /*330*/ .word sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, sys_open_by_handle_at, sys_clock_adjtime .word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev /*340*/ .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr + .word sys_renameat2 From bd6ba3518fcb2539d83163a3f486d09411bc535d Mon Sep 17 00:00:00 2001 From: Joel Stanley Date: Fri, 18 Jul 2014 11:41:37 +0930 Subject: [PATCH 339/455] powerpc: Disable doorbells on Power8 DD1.x These processors do not currently support doorbell IPIs, so remove them from the feature list if we are at DD 1.xx for the 0x004d part. This fixes a regression caused by d4e58e5928f8 (powerpc/powernv: Enable POWER8 doorbell IPIs). With that patch the kernel would hang at boot when calling smp_call_function_many, as the doorbell would not be received by the target CPUs: .smp_call_function_many+0x2bc/0x3c0 (unreliable) .on_each_cpu_mask+0x30/0x100 .cpuidle_register_driver+0x158/0x1a0 .cpuidle_register+0x2c/0x110 .powernv_processor_idle_init+0x23c/0x2c0 .do_one_initcall+0xd4/0x260 .kernel_init_freeable+0x25c/0x33c .kernel_init+0x1c/0x120 .ret_from_kernel_thread+0x58/0x7c Fixes: d4e58e5928f8 (powerpc/powernv: Enable POWER8 doorbell IPIs) Signed-off-by: Joel Stanley Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/include/asm/cputable.h | 1 + arch/powerpc/kernel/cputable.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index bc2347774f0a..0fdd7eece6d9 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -447,6 +447,7 @@ extern const char *powerpc_base_platform; CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \ CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP) #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG) +#define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL) #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 965291b4c2fa..0c157642c2a1 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -527,6 +527,26 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check_early = __machine_check_early_realmode_p8, .platform = "power8", }, + { /* Power8 DD1: Does not support doorbell IPIs */ + .pvr_mask = 0xffffff00, + .pvr_value = 0x004d0100, + .cpu_name = "POWER8 (raw)", + .cpu_features = CPU_FTRS_POWER8_DD1, + .cpu_user_features = COMMON_USER_POWER8, + .cpu_user_features2 = COMMON_USER2_POWER8, + .mmu_features = MMU_FTRS_POWER8, + .icache_bsize = 128, + .dcache_bsize = 128, + .num_pmcs = 6, + .pmc_type = PPC_PMC_IBM, + .oprofile_cpu_type = "ppc64/power8", + .oprofile_type = PPC_OPROFILE_INVALID, + .cpu_setup = __setup_cpu_power8, + .cpu_restore = __restore_cpu_power8, + .flush_tlb = __flush_tlb_power8, + .machine_check_early = __machine_check_early_realmode_p8, + .platform = "power8", + }, { /* Power8 */ .pvr_mask = 0xffff0000, .pvr_value = 0x004d0000, From e698b9667879b79e479cc985f9d74ecf126e343e Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Sat, 19 Jul 2014 17:47:57 +1000 Subject: [PATCH 340/455] powerpc: Fix bugs in emulate_step() This fixes some bugs in emulate_step(). First, the setting of the carry bit for the arithmetic right-shift instructions was not correct on 64-bit machines because we were masking with a mask of type int rather than unsigned long. Secondly, the sld (shift left doubleword) instruction was using the wrong instruction field for the register containing the shift count. Signed-off-by: Paul Mackerras Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/lib/sstep.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 412dd46dd0b7..5c09f365c842 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1198,7 +1198,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) sh = regs->gpr[rb] & 0x3f; ival = (signed int) regs->gpr[rd]; regs->gpr[ra] = ival >> (sh < 32 ? sh : 31); - if (ival < 0 && (sh >= 32 || (ival & ((1 << sh) - 1)) != 0)) + if (ival < 0 && (sh >= 32 || (ival & ((1ul << sh) - 1)) != 0)) regs->xer |= XER_CA; else regs->xer &= ~XER_CA; @@ -1208,7 +1208,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) sh = rb; ival = (signed int) regs->gpr[rd]; regs->gpr[ra] = ival >> sh; - if (ival < 0 && (ival & ((1 << sh) - 1)) != 0) + if (ival < 0 && (ival & ((1ul << sh) - 1)) != 0) regs->xer |= XER_CA; else regs->xer &= ~XER_CA; @@ -1216,7 +1216,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) #ifdef __powerpc64__ case 27: /* sld */ - sh = regs->gpr[rd] & 0x7f; + sh = regs->gpr[rb] & 0x7f; if (sh < 64) regs->gpr[ra] = regs->gpr[rd] << sh; else @@ -1235,7 +1235,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) sh = regs->gpr[rb] & 0x7f; ival = (signed long int) regs->gpr[rd]; regs->gpr[ra] = ival >> (sh < 64 ? sh : 63); - if (ival < 0 && (sh >= 64 || (ival & ((1 << sh) - 1)) != 0)) + if (ival < 0 && (sh >= 64 || (ival & ((1ul << sh) - 1)) != 0)) regs->xer |= XER_CA; else regs->xer &= ~XER_CA; @@ -1246,7 +1246,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned int instr) sh = rb | ((instr & 2) << 4); ival = (signed long int) regs->gpr[rd]; regs->gpr[ra] = ival >> sh; - if (ival < 0 && (ival & ((1 << sh) - 1)) != 0) + if (ival < 0 && (ival & ((1ul << sh) - 1)) != 0) regs->xer |= XER_CA; else regs->xer &= ~XER_CA; From dad6f37c2602e4af6c3aecfdb41f2d8bd4668163 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Tue, 15 Jul 2014 20:22:30 +0530 Subject: [PATCH 341/455] powerpc: subpage_protect: Increase the array size to take care of 64TB We now support TASK_SIZE of 16TB, hence the array should be 8. Fixes the below crash: Unable to handle kernel paging request for data at address 0x000100bd Faulting instruction address: 0xc00000000004f914 cpu 0x13: Vector: 300 (Data Access) at [c000000fea75fa90] pc: c00000000004f914: .sys_subpage_prot+0x2d4/0x5c0 lr: c00000000004fb5c: .sys_subpage_prot+0x51c/0x5c0 sp: c000000fea75fd10 msr: 9000000000009032 dar: 100bd dsisr: 40000000 current = 0xc000000fea6ae490 paca = 0xc00000000fb8ab00 softe: 0 irq_happened: 0x00 pid = 8237, comm = a.out enter ? for help [c000000fea75fe30] c00000000000a164 syscall_exit+0x0/0x98 Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/include/asm/mmu-hash64.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index 807014dde821..c2b4dcf23d03 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -22,6 +22,7 @@ */ #include #include +#include /* * Segment table @@ -496,7 +497,7 @@ extern void slb_set_size(u16 size); */ struct subpage_prot_table { unsigned long maxaddr; /* only addresses < this are protected */ - unsigned int **protptrs[2]; + unsigned int **protptrs[(TASK_SIZE_USER64 >> 43)]; unsigned int *low_prot[4]; }; From 97a9a7179aad701ab676e6f29eb90766a1acfde2 Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Thu, 10 Jul 2014 14:50:57 -0400 Subject: [PATCH 342/455] powerpc/pseries: dynamically added OF nodes need to call of_node_init Commit 75b57ecf9 refactored device tree nodes to use kobjects such that they can be exposed via /sysfs. A secondary commit 0829f6d1f furthered this rework by moving the kobect initialization logic out of of_node_add into its own of_node_init function. The inital commit removed the existing kref_init calls in the pseries dlpar code with the assumption kobject initialization would occur in of_node_add. The second commit had the side effect of triggering a BUG_ON during DLPAR, migration and suspend/resume operations as a result of dynamically added nodes being uninitialized. This patch fixes this by adding of_node_init calls in place of the previously removed kref_init calls. Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes") Cc: stable@vger.kernel.org Signed-off-by: Tyrel Datwyler Acked-by: Nathan Fontenot Acked-by: Grant Likely Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/platforms/pseries/dlpar.c | 1 + arch/powerpc/platforms/pseries/reconfig.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index 022b38e6a80b..2d0b4d68a40a 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -86,6 +86,7 @@ static struct device_node *dlpar_parse_cc_node(struct cc_workarea *ccwa, } of_node_set_flag(dn, OF_DYNAMIC); + of_node_init(dn); return dn; } diff --git a/arch/powerpc/platforms/pseries/reconfig.c b/arch/powerpc/platforms/pseries/reconfig.c index 0435bb65d0aa..1c0a60d98867 100644 --- a/arch/powerpc/platforms/pseries/reconfig.c +++ b/arch/powerpc/platforms/pseries/reconfig.c @@ -69,6 +69,7 @@ static int pSeries_reconfig_add_node(const char *path, struct property *proplist np->properties = proplist; of_node_set_flag(np, OF_DYNAMIC); + of_node_init(np); np->parent = derive_parent(path); if (IS_ERR(np->parent)) { From 6f5405bc2ee0102bb3856e2cdea64ff415db2e0c Mon Sep 17 00:00:00 2001 From: Li Zhong Date: Mon, 21 Jul 2014 17:55:13 +0800 Subject: [PATCH 343/455] powerpc: use _GLOBAL_TOC for memmove memmove may be called from module code copy_pages(btrfs), and it may call memcpy, which may call back to C code, so it needs to use _GLOBAL_TOC to set up r2 correctly. This fixes following error when I tried to boot an le guest: Vector: 300 (Data Access) at [c000000073f97210] pc: c000000000015004: enable_kernel_altivec+0x24/0x80 lr: c000000000058fbc: enter_vmx_copy+0x3c/0x60 sp: c000000073f97490 msr: 8000000002009033 dar: d000000001d50170 dsisr: 40000000 current = 0xc0000000734c0000 paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01 pid = 815, comm = mktemp enter ? for help [c000000073f974f0] c000000000058fbc enter_vmx_copy+0x3c/0x60 [c000000073f97510] c000000000057d34 memcpy_power7+0x274/0x840 [c000000073f97610] d000000001c3179c copy_pages+0xfc/0x110 [btrfs] [c000000073f97660] d000000001c3c248 memcpy_extent_buffer+0xe8/0x160 [btrfs] [c000000073f97700] d000000001be4be8 setup_items_for_insert+0x208/0x4a0 [btrfs] [c000000073f97820] d000000001be50b4 btrfs_insert_empty_items+0xf4/0x140 [btrfs] [c000000073f97890] d000000001bfed30 insert_with_overflow+0x70/0x180 [btrfs] [c000000073f97900] d000000001bff174 btrfs_insert_dir_item+0x114/0x2f0 [btrfs] [c000000073f979a0] d000000001c1f92c btrfs_add_link+0x10c/0x370 [btrfs] [c000000073f97a40] d000000001c20e94 btrfs_create+0x204/0x270 [btrfs] [c000000073f97b00] c00000000026d438 vfs_create+0x178/0x210 [c000000073f97b50] c000000000270a70 do_last+0x9f0/0xe90 [c000000073f97c20] c000000000271010 path_openat+0x100/0x810 [c000000073f97ce0] c000000000272ea8 do_filp_open+0x58/0xd0 [c000000073f97dc0] c00000000025ade8 do_sys_open+0x1b8/0x300 [c000000073f97e30] c00000000000a008 syscall_exit+0x0/0x7c Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/lib/mem_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S index 0738f96befbf..43435c6892fb 100644 --- a/arch/powerpc/lib/mem_64.S +++ b/arch/powerpc/lib/mem_64.S @@ -77,7 +77,7 @@ _GLOBAL(memset) stb r4,0(r6) blr -_GLOBAL(memmove) +_GLOBAL_TOC(memmove) cmplw 0,r3,r4 bgt backwards_memcpy b memcpy From 88b98287356762cc16c9ff6cd48116160a5d4dba Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 21 Jul 2014 21:34:30 -0700 Subject: [PATCH 344/455] drm/i915: fix freeze with blank screen booting highmem x86_64 boots and displays fine, but booting x86_32 with CONFIG_HIGHMEM has frozen with a blank screen throughout 3.16-rc on this ThinkPad T420s, with i915 generation 6 graphics. Fix 9d0a6fa6c5e6 ("drm/i915: add render state initialization"): kunmap() takes struct page * argument, not virtual address. Which the compiler kindly points out, if you use the appropriate u32 *batch, instead of silencing it with a void *. Why did bisection lead decisively to nearby 229b0489aa75 ("drm/i915: add null render states for gen6, gen7 and gen8")? Because the u32 deposited at that virtual address by the previous stub failed the PageHighMem test, and so did no harm. Signed-off-by: Hugh Dickins Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem_render_state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c index 3521f998a178..34894b573064 100644 --- a/drivers/gpu/drm/i915/i915_gem_render_state.c +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c @@ -31,7 +31,7 @@ struct i915_render_state { struct drm_i915_gem_object *obj; unsigned long ggtt_offset; - void *batch; + u32 *batch; u32 size; u32 len; }; @@ -80,7 +80,7 @@ free: static void render_state_free(struct i915_render_state *so) { - kunmap(so->batch); + kunmap(kmap_to_page(so->batch)); i915_gem_object_ggtt_unpin(so->obj); drm_gem_object_unreference(&so->obj->base); kfree(so); From 8142b215501f8b291a108a202b3a053a265b03dd Mon Sep 17 00:00:00 2001 From: Sven Wegener Date: Tue, 22 Jul 2014 10:26:06 +0200 Subject: [PATCH 345/455] x86_32, entry: Store badsys error code in %eax Commit 554086d ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by: Sven Wegener Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net Reviewed-and-tested-by: Andy Lutomirski Cc: # If 554086d is backported Signed-off-by: H. Peter Anvin --- arch/x86/kernel/entry_32.S | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index dbaa23e78b36..0d0c9d4ab6d5 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -425,8 +425,8 @@ sysenter_do_call: cmpl $(NR_syscalls), %eax jae sysenter_badsys call *sys_call_table(,%eax,4) - movl %eax,PT_EAX(%esp) sysenter_after_call: + movl %eax,PT_EAX(%esp) LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF @@ -502,6 +502,7 @@ ENTRY(system_call) jae syscall_badsys syscall_call: call *sys_call_table(,%eax,4) +syscall_after_call: movl %eax,PT_EAX(%esp) # store the return value syscall_exit: LOCKDEP_SYS_EXIT @@ -675,12 +676,12 @@ syscall_fault: END(syscall_fault) syscall_badsys: - movl $-ENOSYS,PT_EAX(%esp) - jmp syscall_exit + movl $-ENOSYS,%eax + jmp syscall_after_call END(syscall_badsys) sysenter_badsys: - movl $-ENOSYS,PT_EAX(%esp) + movl $-ENOSYS,%eax jmp sysenter_after_call END(syscall_badsys) CFI_ENDPROC From 901401166464dc1875825235bb2541af31b4c384 Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Tue, 22 Jul 2014 23:11:03 +0900 Subject: [PATCH 346/455] ALSA: bebob: Fix a missing to unlock mutex in error handling case In error handling case, special_clk_ctl_put() returns without unlock_mutex(), therefore the mutex is still locked. This commit moves mutex_lock() after the error handling case. This commit is my solution for this post. [PATCH -next] ALSA: bebob: Fix missing unlock on error in special_clk_ctl_put() https://lkml.org/lkml/2014/7/20/12 Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai --- sound/firewire/bebob/bebob_maudio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/firewire/bebob/bebob_maudio.c b/sound/firewire/bebob/bebob_maudio.c index 6af50eb80ea7..fc470c60e433 100644 --- a/sound/firewire/bebob/bebob_maudio.c +++ b/sound/firewire/bebob/bebob_maudio.c @@ -379,12 +379,12 @@ static int special_clk_ctl_put(struct snd_kcontrol *kctl, struct special_params *params = bebob->maudio_special_quirk; int err, id; - mutex_lock(&bebob->mutex); - id = uval->value.enumerated.item[0]; if (id >= ARRAY_SIZE(special_clk_labels)) return 0; + mutex_lock(&bebob->mutex); + err = avc_maudio_set_special_clk(bebob, id, params->dig_in_fmt, params->dig_out_fmt, From 5a0438f4a6328b47bd3c00b2f03eb766cc72a75c Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Tue, 22 Jul 2014 23:13:47 +0900 Subject: [PATCH 347/455] ALSA: bebob: Use different labels for digital input/output This commit uses different labels for control elements of digital input/output interfaces to correct my misunderstanding about M-Audio Firewire 1814 and ProjectMix I/O. According to user manuals for these two models, they have two modes for digital input; one is S/PDIF in both of optical and coaxial interfaces, another is ADAT in optical interface only. But in current implementation, a control element for it reduced labels which a control element for digital output uses because of my misunderstanding that optical interface is not available for digital input with S/PDIF mode. Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai --- sound/firewire/bebob/bebob_maudio.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/sound/firewire/bebob/bebob_maudio.c b/sound/firewire/bebob/bebob_maudio.c index fc470c60e433..42e6f22156f0 100644 --- a/sound/firewire/bebob/bebob_maudio.c +++ b/sound/firewire/bebob/bebob_maudio.c @@ -434,8 +434,8 @@ static struct snd_kcontrol_new special_sync_ctl = { .get = special_sync_ctl_get, }; -/* Digital interface control for special firmware */ -static char *const special_dig_iface_labels[] = { +/* Digital input interface control for special firmware */ +static char *const special_dig_in_iface_labels[] = { "S/PDIF Optical", "S/PDIF Coaxial", "ADAT Optical" }; static int special_dig_in_iface_ctl_info(struct snd_kcontrol *kctl, @@ -443,13 +443,13 @@ static int special_dig_in_iface_ctl_info(struct snd_kcontrol *kctl, { einf->type = SNDRV_CTL_ELEM_TYPE_ENUMERATED; einf->count = 1; - einf->value.enumerated.items = ARRAY_SIZE(special_dig_iface_labels); + einf->value.enumerated.items = ARRAY_SIZE(special_dig_in_iface_labels); if (einf->value.enumerated.item >= einf->value.enumerated.items) einf->value.enumerated.item = einf->value.enumerated.items - 1; strcpy(einf->value.enumerated.name, - special_dig_iface_labels[einf->value.enumerated.item]); + special_dig_in_iface_labels[einf->value.enumerated.item]); return 0; } @@ -504,9 +504,14 @@ static int special_dig_in_iface_ctl_set(struct snd_kcontrol *kctl, dig_in_fmt, params->dig_out_fmt, params->clk_lock); - if ((err < 0) || (params->dig_in_fmt > 0)) /* ADAT */ + if (err < 0) goto end; + /* For ADAT, optical interface is only available. */ + if (params->dig_in_fmt > 0) + goto end; + + /* For S/PDIF, optical/coaxial interfaces are selectable. */ err = avc_audio_set_selector(bebob->unit, 0x00, 0x04, dig_in_iface); if (err < 0) dev_err(&bebob->unit->device, @@ -525,18 +530,22 @@ static struct snd_kcontrol_new special_dig_in_iface_ctl = { .put = special_dig_in_iface_ctl_set }; +/* Digital output interface control for special firmware */ +static char *const special_dig_out_iface_labels[] = { + "S/PDIF Optical and Coaxial", "ADAT Optical" +}; static int special_dig_out_iface_ctl_info(struct snd_kcontrol *kctl, struct snd_ctl_elem_info *einf) { einf->type = SNDRV_CTL_ELEM_TYPE_ENUMERATED; einf->count = 1; - einf->value.enumerated.items = ARRAY_SIZE(special_dig_iface_labels) - 1; + einf->value.enumerated.items = ARRAY_SIZE(special_dig_out_iface_labels); if (einf->value.enumerated.item >= einf->value.enumerated.items) einf->value.enumerated.item = einf->value.enumerated.items - 1; strcpy(einf->value.enumerated.name, - special_dig_iface_labels[einf->value.enumerated.item + 1]); + special_dig_out_iface_labels[einf->value.enumerated.item]); return 0; } From f77ac91e8edade4755f732d52fa094dc3bfd8b8e Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Tue, 22 Jul 2014 23:13:56 +0900 Subject: [PATCH 348/455] ALSA: bebob: Correction for return value of .put callback This commit is for correction of my misunderstanding about return value of .put callback in ALSA Control interface. According to 'Writing ALSA Driver' (*1), return value of the callback has three patterns; 1: changed, 0: not changed, an negative value: fatal error. But I misunderstood that it's boolean; zero or nonzero. *1: Writing an ALSA Driver (2005, Takashi Iwai) http://www.alsa-project.org/main/index.php/ALSA_Driver_Documentation Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai --- sound/firewire/bebob/bebob_maudio.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/sound/firewire/bebob/bebob_maudio.c b/sound/firewire/bebob/bebob_maudio.c index 42e6f22156f0..d6d6ff8cb8dd 100644 --- a/sound/firewire/bebob/bebob_maudio.c +++ b/sound/firewire/bebob/bebob_maudio.c @@ -391,7 +391,10 @@ static int special_clk_ctl_put(struct snd_kcontrol *kctl, params->clk_lock); mutex_unlock(&bebob->mutex); - return err >= 0; + if (err >= 0) + err = 1; + + return err; } static struct snd_kcontrol_new special_clk_ctl = { .name = "Clock Source", @@ -491,14 +494,16 @@ static int special_dig_in_iface_ctl_set(struct snd_kcontrol *kctl, unsigned int id, dig_in_fmt, dig_in_iface; int err; - mutex_lock(&bebob->mutex); - id = uval->value.enumerated.item[0]; + if (id >= ARRAY_SIZE(special_dig_in_iface_labels)) + return -EINVAL; /* decode user value */ dig_in_fmt = (id >> 1) & 0x01; dig_in_iface = id & 0x01; + mutex_lock(&bebob->mutex); + err = avc_maudio_set_special_clk(bebob, params->clk_src, dig_in_fmt, @@ -508,14 +513,17 @@ static int special_dig_in_iface_ctl_set(struct snd_kcontrol *kctl, goto end; /* For ADAT, optical interface is only available. */ - if (params->dig_in_fmt > 0) + if (params->dig_in_fmt > 0) { + err = 1; goto end; + } /* For S/PDIF, optical/coaxial interfaces are selectable. */ err = avc_audio_set_selector(bebob->unit, 0x00, 0x04, dig_in_iface); if (err < 0) dev_err(&bebob->unit->device, "fail to set digital input interface: %d\n", err); + err = 1; end: special_stream_formation_set(bebob); mutex_unlock(&bebob->mutex); @@ -567,16 +575,20 @@ static int special_dig_out_iface_ctl_set(struct snd_kcontrol *kctl, unsigned int id; int err; - mutex_lock(&bebob->mutex); - id = uval->value.enumerated.item[0]; + if (id >= ARRAY_SIZE(special_dig_out_iface_labels)) + return -EINVAL; + + mutex_lock(&bebob->mutex); err = avc_maudio_set_special_clk(bebob, params->clk_src, params->dig_in_fmt, id, params->clk_lock); - if (err >= 0) + if (err >= 0) { special_stream_formation_set(bebob); + err = 1; + } mutex_unlock(&bebob->mutex); return err; From a800bad36619ce47ac0222004635448e6c91ff72 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Tue, 22 Jul 2014 16:37:42 +0200 Subject: [PATCH 349/455] fuse: s_time_gran fix Default s_time_gran is 1, don't overwrite that if userspace didn't explicitly specify one. Signed-off-by: Miklos Szeredi Cc: # v3.15+ --- fs/fuse/inode.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 8474028d7848..5ca874f2415b 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -907,9 +907,6 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->writeback_cache = 1; if (arg->time_gran && arg->time_gran <= 1000000000) fc->sb->s_time_gran = arg->time_gran; - else - fc->sb->s_time_gran = 1000000000; - } else { ra_pages = fc->max_read / PAGE_CACHE_SIZE; fc->no_lock = 1; From d7afaec0b564f0609e116f562983b8e72fc3e9c9 Mon Sep 17 00:00:00 2001 From: Andrew Gallagher Date: Tue, 22 Jul 2014 16:37:43 +0200 Subject: [PATCH 350/455] fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT Here some additional changes to set a capability flag so that clients can detect when it's appropriate to return -ENOSYS from open. This amends the following commit introduced in 3.14: 7678ac50615d fuse: support clients that don't implement 'open' However we can only add the flag to 3.15 and later since there was no protocol version update in 3.14. Signed-off-by: Miklos Szeredi Cc: # v3.15+ --- fs/fuse/inode.c | 2 +- include/uapi/linux/fuse.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 5ca874f2415b..03246cd9d47a 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -935,7 +935,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ | FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA | FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO | FUSE_ASYNC_DIO | - FUSE_WRITEBACK_CACHE; + FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 40b5ca8a1b1f..25084a052a1e 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -101,6 +101,7 @@ * - add FATTR_CTIME * - add ctime and ctimensec to fuse_setattr_in * - add FUSE_RENAME2 request + * - add FUSE_NO_OPEN_SUPPORT flag */ #ifndef _LINUX_FUSE_H @@ -229,6 +230,7 @@ struct fuse_file_lock { * FUSE_READDIRPLUS_AUTO: adaptive readdirplus * FUSE_ASYNC_DIO: asynchronous direct I/O submission * FUSE_WRITEBACK_CACHE: use writeback cache for buffered writes + * FUSE_NO_OPEN_SUPPORT: kernel supports zero-message opens */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -247,6 +249,7 @@ struct fuse_file_lock { #define FUSE_READDIRPLUS_AUTO (1 << 14) #define FUSE_ASYNC_DIO (1 << 15) #define FUSE_WRITEBACK_CACHE (1 << 16) +#define FUSE_NO_OPEN_SUPPORT (1 << 17) /** * CUSE INIT request/reply flags From eb12f72ee7245ca207818b9efd10be2641494502 Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Wed, 23 Jul 2014 00:02:08 +0900 Subject: [PATCH 351/455] ALSA: bebob: Correction for return value of special_clk_ctl_put() in error This commit is a supplement to my previous patch. http://mailman.alsa-project.org/pipermail/alsa-devel/2014-July/079190.html The special_clk_ctl_put() still returns 0 in error handling case. It should return -EINVAL. Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai --- sound/firewire/bebob/bebob_maudio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/firewire/bebob/bebob_maudio.c b/sound/firewire/bebob/bebob_maudio.c index d6d6ff8cb8dd..70faa3a32526 100644 --- a/sound/firewire/bebob/bebob_maudio.c +++ b/sound/firewire/bebob/bebob_maudio.c @@ -381,7 +381,7 @@ static int special_clk_ctl_put(struct snd_kcontrol *kctl, id = uval->value.enumerated.item[0]; if (id >= ARRAY_SIZE(special_clk_labels)) - return 0; + return -EINVAL; mutex_lock(&bebob->mutex); From 5b7532756382cb31748f73df6a0af0138390c04f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Tue, 22 Jul 2014 11:30:43 +0200 Subject: [PATCH 352/455] drm/radeon: fix error handling in radeon_vm_bo_set_addr MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Christian König Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon_vm.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c index fa41e0d7d17d..725d3669014f 100644 --- a/drivers/gpu/drm/radeon/radeon_vm.c +++ b/drivers/gpu/drm/radeon/radeon_vm.c @@ -472,6 +472,10 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev, if (bo_va->soffset) { /* add a clone of the bo_va to clear the old address */ tmp = kzalloc(sizeof(struct radeon_bo_va), GFP_KERNEL); + if (!tmp) { + mutex_unlock(&vm->mutex); + return -ENOMEM; + } tmp->soffset = bo_va->soffset; tmp->eoffset = bo_va->eoffset; tmp->vm = vm; From fa8f136fe9a8fbdcb11a1db94f30146cae6d8777 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Tue, 22 Jul 2014 21:00:26 +0200 Subject: [PATCH 353/455] mac80211: fix crash on getting sta info with uninitialized rate control If the expected throughput is queried before rate control has been initialized, the minstrel op for it will crash while trying to access the rate table. Check for WLAN_STA_RATE_CONTROL before attempting to use the rate control op. Reported-by: Jean-Pierre Tosoni Signed-off-by: Felix Fietkau Signed-off-by: Johannes Berg --- net/mac80211/cfg.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index d7513a503be1..592f4b152ba8 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -472,12 +472,15 @@ static void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo) { struct ieee80211_sub_if_data *sdata = sta->sdata; struct ieee80211_local *local = sdata->local; - struct rate_control_ref *ref = local->rate_ctrl; + struct rate_control_ref *ref = NULL; struct timespec uptime; u64 packets = 0; u32 thr = 0; int i, ac; + if (test_sta_flag(sta, WLAN_STA_RATE_CONTROL)) + ref = local->rate_ctrl; + sinfo->generation = sdata->local->sta_generation; sinfo->filled = STATION_INFO_INACTIVE_TIME | From c9b227723d051184b9e78f20c75ae2f9d44a6ea2 Mon Sep 17 00:00:00 2001 From: Shinobu Uehara Date: Mon, 21 Jul 2014 22:04:29 -0700 Subject: [PATCH 354/455] ARM: shmobile: r8a7791: Fix SD2CKCR register address 59e79895b95892863617ce630fbda467f2470575 (ARM: shmobile: r8a7791: Add clocks) added r8a7791 SD clocks when v3.14. 2c60a7df72711fb8b4be1e6aa651ab166a8931bc (ARM: shmobile: Add SDHI devices for Koelsch DTS) enabled SD on r8a7791 Koelsch when v3.15. 1299df03d7191ab4356c995dde8b912d3c8922e9 (ARM: shmobile: henninger: add SDHI0/2 DT support) enable SD on r8a7791 Henninger when v3.16. But r8a7791 SD clock had wrong address. This patch fixup it. [Kuninori Morimoto: tidyup for upstreaming] Signed-off-by: Shinobu Uehara Signed-off-by: Kuninori Morimoto Signed-off-by: Simon Horman --- arch/arm/boot/dts/r8a7791.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/r8a7791.dtsi b/arch/arm/boot/dts/r8a7791.dtsi index 8d7ffaeff6e0..79f68acfd5d4 100644 --- a/arch/arm/boot/dts/r8a7791.dtsi +++ b/arch/arm/boot/dts/r8a7791.dtsi @@ -540,9 +540,9 @@ #clock-cells = <0>; clock-output-names = "sd1"; }; - sd2_clk: sd3_clk@e615007c { + sd2_clk: sd3_clk@e615026c { compatible = "renesas,r8a7791-div6-clock", "renesas,cpg-div6-clock"; - reg = <0 0xe615007c 0 4>; + reg = <0 0xe615026c 0 4>; clocks = <&pll1_div2_clk>; #clock-cells = <0>; clock-output-names = "sd2"; From 1be9a950c646c9092fb3618197f7b6bfb50e82aa Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 22 Jul 2014 15:22:45 +0200 Subject: [PATCH 355/455] net: sctp: inherit auth_capable on INIT collisions Jason reported an oops caused by SCTP on his ARM machine with SCTP authentication enabled: Internal error: Oops: 17 [#1] ARM CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1 task: c6eefa40 ti: c6f52000 task.ti: c6f52000 PC is at sctp_auth_calculate_hmac+0xc4/0x10c LR is at sg_init_table+0x20/0x38 pc : [] lr : [] psr: 40000013 sp : c6f538e8 ip : 00000000 fp : c6f53924 r10: c6f50d80 r9 : 00000000 r8 : 00010000 r7 : 00000000 r6 : c7be4000 r5 : 00000000 r4 : c6f56254 r3 : c00c8170 r2 : 00000001 r1 : 00000008 r0 : c6f1e660 Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005397f Table: 06f28000 DAC: 00000015 Process sctp-test (pid: 104, stack limit = 0xc6f521c0) Stack: (0xc6f538e8 to 0xc6f54000) [...] Backtrace: [] (sctp_auth_calculate_hmac+0x0/0x10c) from [] (sctp_packet_transmit+0x33c/0x5c8) [] (sctp_packet_transmit+0x0/0x5c8) from [] (sctp_outq_flush+0x7fc/0x844) [] (sctp_outq_flush+0x0/0x844) from [] (sctp_outq_uncork+0x24/0x28) [] (sctp_outq_uncork+0x0/0x28) from [] (sctp_side_effects+0x1134/0x1220) [] (sctp_side_effects+0x0/0x1220) from [] (sctp_do_sm+0xac/0xd4) [] (sctp_do_sm+0x0/0xd4) from [] (sctp_assoc_bh_rcv+0x118/0x160) [] (sctp_assoc_bh_rcv+0x0/0x160) from [] (sctp_inq_push+0x6c/0x74) [] (sctp_inq_push+0x0/0x74) from [] (sctp_rcv+0x7d8/0x888) While we already had various kind of bugs in that area ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable") and b14878ccb7fa ("net: sctp: cache auth_enable per endpoint"), this one is a bit of a different kind. Giving a bit more background on why SCTP authentication is needed can be found in RFC4895: SCTP uses 32-bit verification tags to protect itself against blind attackers. These values are not changed during the lifetime of an SCTP association. Looking at new SCTP extensions, there is the need to have a method of proving that an SCTP chunk(s) was really sent by the original peer that started the association and not by a malicious attacker. To cause this bug, we're triggering an INIT collision between peers; normal SCTP handshake where both sides intent to authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO parameters that are being negotiated among peers: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- RFC4895 says that each endpoint therefore knows its own random number and the peer's random number *after* the association has been established. The local and peer's random number along with the shared key are then part of the secret used for calculating the HMAC in the AUTH chunk. Now, in our scenario, we have 2 threads with 1 non-blocking SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling sctp_bindx(3), listen(2) and connect(2) against each other, thus the handshake looks similar to this, e.g.: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------- -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------> ... Since such collisions can also happen with verification tags, the RFC4895 for AUTH rather vaguely says under section 6.1: In case of INIT collision, the rules governing the handling of this Random Number follow the same pattern as those for the Verification Tag, as explained in Section 5.2.4 of RFC 2960 [5]. Therefore, each endpoint knows its own Random Number and the peer's Random Number after the association has been established. In RFC2960, section 5.2.4, we're eventually hitting Action B: B) In this case, both sides may be attempting to start an association at about the same time but the peer endpoint started its INIT after responding to the local endpoint's INIT. Thus it may have picked a new Verification Tag not being aware of the previous Tag it had sent this endpoint. The endpoint should stay in or enter the ESTABLISHED state but it MUST update its peer's Verification Tag from the State Cookie, stop any init or cookie timers that may running and send a COOKIE ACK. In other words, the handling of the Random parameter is the same as behavior for the Verification Tag as described in Action B of section 5.2.4. Looking at the code, we exactly hit the sctp_sf_do_dupcook_b() case which triggers an SCTP_CMD_UPDATE_ASSOC command to the side effect interpreter, and in fact it properly copies over peer_{random, hmacs, chunks} parameters from the newly created association to update the existing one. Also, the old asoc_shared_key is being released and based on the new params, sctp_auth_asoc_init_active_key() updated. However, the issue observed in this case is that the previous asoc->peer.auth_capable was 0, and has *not* been updated, so that instead of creating a new secret, we're doing an early return from the function sctp_auth_asoc_init_active_key() leaving asoc->asoc_shared_key as NULL. However, we now have to authenticate chunks from the updated chunk list (e.g. COOKIE-ACK). That in fact causes the server side when responding with ... <------------------ AUTH; COOKIE-ACK ----------------- ... to trigger a NULL pointer dereference, since in sctp_packet_transmit(), it discovers that an AUTH chunk is being queued for xmit, and thus it calls sctp_auth_calculate_hmac(). Since the asoc->active_key_id is still inherited from the endpoint, and the same as encoded into the chunk, it uses asoc->asoc_shared_key, which is still NULL, as an asoc_key and dereferences it in ... crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len) ... causing an oops. All this happens because sctp_make_cookie_ack() called with the *new* association has the peer.auth_capable=1 and therefore marks the chunk with auth=1 after checking sctp_auth_send_cid(), but it is *actually* sent later on over the then *updated* association's transport that didn't initialize its shared key due to peer.auth_capable=0. Since control chunks in that case are not sent by the temporary association which are scheduled for deletion, they are issued for xmit via SCTP_CMD_REPLY in the interpreter with the context of the *updated* association. peer.auth_capable was 0 in the updated association (which went from COOKIE_WAIT into ESTABLISHED state), since all previous processing that performed sctp_process_init() was being done on temporary associations, that we eventually throw away each time. The correct fix is to update to the new peer.auth_capable value as well in the collision case via sctp_assoc_update(), so that in case the collision migrated from 0 -> 1, sctp_auth_asoc_init_active_key() can properly recalculate the secret. This therefore fixes the observed server panic. Fixes: 730fc3d05cd4 ("[SCTP]: Implete SCTP-AUTH parameter processing") Reported-by: Jason Gunthorpe Signed-off-by: Daniel Borkmann Tested-by: Jason Gunthorpe Cc: Vlad Yasevich Acked-by: Vlad Yasevich Signed-off-by: David S. Miller --- net/sctp/associola.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 9de23a222d3f..06a9ee6b2d3a 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1097,6 +1097,7 @@ void sctp_assoc_update(struct sctp_association *asoc, asoc->c = new->c; asoc->peer.rwnd = new->peer.rwnd; asoc->peer.sack_needed = new->peer.sack_needed; + asoc->peer.auth_capable = new->peer.auth_capable; asoc->peer.i = new->peer.i; sctp_tsnmap_init(&asoc->peer.tsn_map, SCTP_TSN_MAP_INITIAL, asoc->peer.i.initial_tsn, GFP_ATOMIC); From 474ea9cafc459976827a477f2c30eaf6313cb7c1 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 22 Jul 2014 11:01:52 -0700 Subject: [PATCH 356/455] net: bcmgenet: correctly pad short packets Packets shorter than ETH_ZLEN were not padded with zeroes, hence leaking potentially sensitive information. This bug has been present since the driver got accepted in commit 1c1008c793fa46703a2fee469f4235e1c7984333 ("net: bcmgenet: add main driver file"). Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index 16281ad2da12..4e615debe472 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -1149,6 +1149,11 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev) goto out; } + if (skb_padto(skb, ETH_ZLEN)) { + ret = NETDEV_TX_OK; + goto out; + } + /* set the SKB transmit checksum */ if (priv->desc_64b_en) { ret = bcmgenet_put_tx_csum(dev, skb); From f62d14a8072b9756db36ba394e2b267470a40240 Mon Sep 17 00:00:00 2001 From: Peter Hutterer Date: Mon, 21 Jul 2014 17:51:35 -0700 Subject: [PATCH 357/455] Input: document INPUT_PROP_TOPBUTTONPAD Signed-off-by: Peter Hutterer Signed-off-by: Dmitry Torokhov --- Documentation/input/event-codes.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Documentation/input/event-codes.txt b/Documentation/input/event-codes.txt index f1ea2c69648d..c587a966413e 100644 --- a/Documentation/input/event-codes.txt +++ b/Documentation/input/event-codes.txt @@ -281,6 +281,19 @@ gestures can normally be extracted from it. If INPUT_PROP_SEMI_MT is not set, the device is assumed to be a true MT device. +INPUT_PROP_TOPBUTTONPAD: +----------------------- +Some laptops, most notably the Lenovo *40 series provide a trackstick +device but do not have physical buttons associated with the trackstick +device. Instead, the top area of the touchpad is marked to show +visual/haptic areas for left, middle, right buttons intended to be used +with the trackstick. + +If INPUT_PROP_TOPBUTTONPAD is set, userspace should emulate buttons +accordingly. This property does not affect kernel behavior. +The kernel does not provide button emulation for such devices but treats +them as any other INPUT_PROP_BUTTONPAD device. + Guidelines: ========== The guidelines below ensure proper single-touch and multi-finger functionality. From 8903461c9bc56fcb041fb92d054e2529951770b6 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 23 Jul 2014 17:20:04 +1000 Subject: [PATCH 358/455] powerpc/perf: Fix MMCR2 handling for EBB In the recent commit b50a6c584bb4 "Clear MMCR2 when enabling PMU", I screwed up the handling of MMCR2 for tasks using EBB. We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we overwrite the value of MMCR2 that userspace may have written. That potentially breaks a task that uses EBB and manually uses MMCR2 for event freezing. Fixes: b50a6c584bb4 ("powerpc/perf: Clear MMCR2 when enabling PMU") Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/perf/core-book3s.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 6b0641c3f03f..fe52db2eea6a 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -1307,6 +1307,9 @@ static void power_pmu_enable(struct pmu *pmu) out_enable: pmao_restore_workaround(ebb); + if (ppmu->flags & PPMU_ARCH_207S) + mtspr(SPRN_MMCR2, 0); + mmcr0 = ebb_switch_in(ebb, cpuhw->mmcr[0]); mb(); @@ -1315,9 +1318,6 @@ static void power_pmu_enable(struct pmu *pmu) write_mmcr0(cpuhw, mmcr0); - if (ppmu->flags & PPMU_ARCH_207S) - mtspr(SPRN_MMCR2, 0); - /* * Enable instruction sampling if necessary */ From 23d9cec07c589276561c13b180577c0b87930140 Mon Sep 17 00:00:00 2001 From: Nishanth Menon Date: Tue, 22 Jul 2014 10:39:54 -0500 Subject: [PATCH 359/455] pinctrl: dra: dt-bindings: Fix pull enable/disable The DRA74/72 control module pins have a weak pull up and pull down. This is configured by bit offset 17. if BIT(17) is 1, a pull up is selected, else a pull down is selected. However, this pull resisstor is applied based on BIT(16) - PULLUDENABLE - if BIT(18) is *0*, then pull as defined in BIT(17) is applied, else no weak pulls are applied. We defined this in reverse. Reference: Table 18-5 (Description of the pad configuration register bits) in Technical Reference Manual Revision (DRA74x revision Q: SPRUHI2Q Revised June 2014 and DRA72x revision F: SPRUHP2F - Revised June 2014) Fixes: 6e58b8f1daaf1a ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board") Signed-off-by: Nishanth Menon Tested-by: Felipe Balbi Acked-by: Felipe Balbi Signed-off-by: Tony Lindgren --- include/dt-bindings/pinctrl/dra.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/dt-bindings/pinctrl/dra.h b/include/dt-bindings/pinctrl/dra.h index 002a2855c046..3d33794e4f3e 100644 --- a/include/dt-bindings/pinctrl/dra.h +++ b/include/dt-bindings/pinctrl/dra.h @@ -30,7 +30,8 @@ #define MUX_MODE14 0xe #define MUX_MODE15 0xf -#define PULL_ENA (1 << 16) +#define PULL_ENA (0 << 16) +#define PULL_DIS (1 << 16) #define PULL_UP (1 << 17) #define INPUT_EN (1 << 18) #define SLEWCONTROL (1 << 19) @@ -38,10 +39,10 @@ #define WAKEUP_EVENT (1 << 25) /* Active pin states */ -#define PIN_OUTPUT 0 +#define PIN_OUTPUT (0 | PULL_DIS) #define PIN_OUTPUT_PULLUP (PIN_OUTPUT | PULL_ENA | PULL_UP) #define PIN_OUTPUT_PULLDOWN (PIN_OUTPUT | PULL_ENA) -#define PIN_INPUT INPUT_EN +#define PIN_INPUT (INPUT_EN | PULL_DIS) #define PIN_INPUT_SLEW (INPUT_EN | SLEWCONTROL) #define PIN_INPUT_PULLUP (PULL_ENA | INPUT_EN | PULL_UP) #define PIN_INPUT_PULLDOWN (PULL_ENA | INPUT_EN) From 33753cd2ba41c72a0756edc5dc094d91602deda5 Mon Sep 17 00:00:00 2001 From: Christoph Fritz Date: Mon, 14 Jul 2014 03:36:18 +0200 Subject: [PATCH 360/455] ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable() This patch adds bch8 ecc software fallback which is mostly used by omap3s because they lack hardware elm support. Fixes: 0611c41934ab35ce84dea34ab291897ad3cbc7be (ARM: OMAP2+: gpmc: update gpmc_hwecc_bch_capable() for new platforms and ECC schemes) Cc: # 3.15.x+ Signed-off-by: Christoph Fritz Reviewed-by: Pekon Gupta Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/gpmc-nand.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/arm/mach-omap2/gpmc-nand.c b/arch/arm/mach-omap2/gpmc-nand.c index 17cd39360afe..93914d220069 100644 --- a/arch/arm/mach-omap2/gpmc-nand.c +++ b/arch/arm/mach-omap2/gpmc-nand.c @@ -50,6 +50,16 @@ static bool gpmc_hwecc_bch_capable(enum omap_ecc ecc_opt) soc_is_omap54xx() || soc_is_dra7xx()) return 1; + if (ecc_opt == OMAP_ECC_BCH4_CODE_HW_DETECTION_SW || + ecc_opt == OMAP_ECC_BCH8_CODE_HW_DETECTION_SW) { + if (cpu_is_omap24xx()) + return 0; + else if (cpu_is_omap3630() && (GET_OMAP_REVISION() == 0)) + return 0; + else + return 1; + } + /* OMAP3xxx do not have ELM engine, so cannot support ECC schemes * which require H/W based ECC error detection */ if ((cpu_is_omap34xx() || cpu_is_omap3630()) && @@ -57,14 +67,6 @@ static bool gpmc_hwecc_bch_capable(enum omap_ecc ecc_opt) (ecc_opt == OMAP_ECC_BCH8_CODE_HW))) return 0; - /* - * For now, assume 4-bit mode is only supported on OMAP3630 ES1.x, x>=1 - * and AM33xx derivates. Other chips may be added if confirmed to work. - */ - if ((ecc_opt == OMAP_ECC_BCH4_CODE_HW_DETECTION_SW) && - (!cpu_is_omap3630() || (GET_OMAP_REVISION() == 0))) - return 0; - /* legacy platforms support only HAM1 (1-bit Hamming) ECC scheme */ if (ecc_opt == OMAP_ECC_HAM1_CODE_HW) return 1; From d50314a6b0702c630c35b88148c1acb76d2e4ede Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 18 Jul 2014 11:54:37 +0100 Subject: [PATCH 361/455] arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB ZONE_DMA is created to allow 32-bit only devices to access memory in the absence of an IOMMU. On systems where the memory starts above 4GB, it is expected that some devices have a DMA offset hardwired to be able to access the bottom of the memory. Linux currently supports DT bindings for the DMA offsets but they are not (easily) available early during boot. This patch tries to guess a DMA offset and assumes that ZONE_DMA corresponds to the 32-bit mask above the start of DRAM. Fixes: 2d5a5612bc (arm64: Limit the CMA buffer to 32-bit if ZONE_DMA) Signed-off-by: Catalin Marinas Reported-by: Mark Salter Tested-by: Mark Salter Tested-by: Anup Patel --- arch/arm64/mm/init.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f43db8a69262..e90c5426fe14 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -60,6 +60,17 @@ static int __init early_initrd(char *p) early_param("initrd", early_initrd); #endif +/* + * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It + * currently assumes that for memory starting above 4G, 32-bit devices will + * use a DMA offset. + */ +static phys_addr_t max_zone_dma_phys(void) +{ + phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 32); + return min(offset + (1ULL << 32), memblock_end_of_DRAM()); +} + static void __init zone_sizes_init(unsigned long min, unsigned long max) { struct memblock_region *reg; @@ -70,9 +81,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) /* 4GB maximum for 32-bit only capable devices */ if (IS_ENABLED(CONFIG_ZONE_DMA)) { - unsigned long max_dma_phys = - (unsigned long)(dma_to_phys(NULL, DMA_BIT_MASK(32)) + 1); - max_dma = max(min, min(max, max_dma_phys >> PAGE_SHIFT)); + max_dma = PFN_DOWN(max_zone_dma_phys()); zone_size[ZONE_DMA] = max_dma - min; } zone_size[ZONE_NORMAL] = max - max_dma; @@ -146,7 +155,7 @@ void __init arm64_memblock_init(void) /* 4GB maximum for 32-bit only capable devices */ if (IS_ENABLED(CONFIG_ZONE_DMA)) - dma_phys_limit = dma_to_phys(NULL, DMA_BIT_MASK(32)) + 1; + dma_phys_limit = max_zone_dma_phys(); dma_contiguous_reserve(dma_phys_limit); memblock_allow_resize(); From eedd10f45bdcb2a5b2afa35f845e080c3bc984f2 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 16 Jun 2014 08:57:44 +0100 Subject: [PATCH 362/455] drm/i915: Simplify i915_gem_release_all_mmaps() An object can only have an active gtt mapping if it is currently bound into the global gtt. Therefore we can simply walk the list of all bound objects and check the flag upon those for an active gtt mapping. From commit 48018a57a8f5900e7e53ffaa0adeb784095accfb Author: Paulo Zanoni Date: Fri Dec 13 15:22:31 2013 -0200 drm/i915: release the GTT mmaps when going into D3 Also note that the WARN is inappropriate for this function as GPU activity is orthogonal to GTT mmap status. Rather it is the caller that relies upon this condition and so it should assert that the GPU is idle itself. References: https://bugs.freedesktop.org/show_bug.cgi?id=80081 Signed-off-by: Chris Wilson Cc: Paulo Zanoni Cc: Rodrigo Vivi Cc: Daniel Vetter Reviewed-by: Paulo Zanoni Tested-by: Paulo Zanoni [danvet: cherry-pick from -next to -fixes.] Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f36126383d26..d893e4da5dce 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1616,22 +1616,6 @@ out: return ret; } -void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv) -{ - struct i915_vma *vma; - - /* - * Only the global gtt is relevant for gtt memory mappings, so restrict - * list traversal to objects bound into the global address space. Note - * that the active list should be empty, but better safe than sorry. - */ - WARN_ON(!list_empty(&dev_priv->gtt.base.active_list)); - list_for_each_entry(vma, &dev_priv->gtt.base.active_list, mm_list) - i915_gem_release_mmap(vma->obj); - list_for_each_entry(vma, &dev_priv->gtt.base.inactive_list, mm_list) - i915_gem_release_mmap(vma->obj); -} - /** * i915_gem_release_mmap - remove physical page mappings * @obj: obj in question @@ -1657,6 +1641,15 @@ i915_gem_release_mmap(struct drm_i915_gem_object *obj) obj->fault_mappable = false; } +void +i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv) +{ + struct drm_i915_gem_object *obj; + + list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) + i915_gem_release_mmap(obj); +} + uint32_t i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode) { From 1a112d10f03e83fb3a2fdc4c9165865dec8a3ca6 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 23 Jul 2014 09:05:27 -0400 Subject: [PATCH 363/455] libata: introduce ata_host->n_tags to avoid oops on SAS controllers 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32") directly used ata_port->scsi_host->can_queue from ata_qc_new() to determine the number of tags supported by the host; unfortunately, SAS controllers doing SATA don't initialize ->scsi_host leading to the following oops. BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [] ata_qc_new_init+0x188/0x1b0 PGD 0 Oops: 0002 [#1] SMP Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62 Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000 RIP: 0010:[] [] ata_qc_new_init+0x188/0x1b0 RSP: 0018:ffff88061a003ae8 EFLAGS: 00010012 RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298 RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200 R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000 FS: 00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0 Stack: ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200 ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68 ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80 Call Trace: [] ata_sas_queuecmd+0xa1/0x430 [] sas_queuecommand+0x191/0x220 [libsas] [] scsi_dispatch_cmd+0x10e/0x300 [] scsi_request_fn+0x2f5/0x550 [] __blk_run_queue+0x33/0x40 [] queue_unplugged+0x2a/0x90 [] blk_flush_plug_list+0x1b4/0x210 [] blk_finish_plug+0x14/0x50 [] __do_page_cache_readahead+0x198/0x1f0 [] force_page_cache_readahead+0x31/0x50 [] page_cache_sync_readahead+0x3e/0x50 [] generic_file_read_iter+0x496/0x5a0 [] blkdev_read_iter+0x37/0x40 [] new_sync_read+0x7e/0xb0 [] vfs_read+0x94/0x170 [] SyS_read+0x46/0xb0 [] ? SyS_lseek+0x91/0xb0 [] system_call_fastpath+0x16/0x1b Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00 Fix it by introducing ata_host->n_tags which is initialized to ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to scsi_host_template->can_queue in ata_host_register() for !SAS ones. As SAS hosts are never registered, this will give them the same ATA_MAX_QUEUE - 1 as before. Note that we can't use scsi_host->can_queue directly for SAS hosts anyway as they can go higher than the libata maximum. Signed-off-by: Tejun Heo Reported-by: Mike Qiu Reported-by: Jesse Brandeburg Reported-by: Peter Hurley Reported-by: Peter Zijlstra Tested-by: Alexey Kardashevskiy Fixes: 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32") Cc: Kevin Hao Cc: Dan Williams Cc: stable@vger.kernel.org --- drivers/ata/libata-core.c | 16 ++++------------ include/linux/libata.h | 1 + 2 files changed, 5 insertions(+), 12 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index d19c37a7abc9..677c0c1b03bd 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words) static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) { struct ata_queued_cmd *qc = NULL; - unsigned int i, tag, max_queue; - - max_queue = ap->scsi_host->can_queue; + unsigned int max_queue = ap->host->n_tags; + unsigned int i, tag; /* no command while frozen */ if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) @@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev, { spin_lock_init(&host->lock); mutex_init(&host->eh_mutex); + host->n_tags = ATA_MAX_QUEUE - 1; host->dev = dev; host->ops = ops; } @@ -6175,15 +6175,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; - /* - * The max queue supported by hardware must not be greater than - * ATA_MAX_QUEUE. - */ - if (sht->can_queue > ATA_MAX_QUEUE) { - dev_err(host->dev, "BUG: the hardware max queue is too large\n"); - WARN_ON(1); - return -EINVAL; - } + host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE - 1); /* host must have been started */ if (!(host->flags & ATA_HOST_STARTED)) { diff --git a/include/linux/libata.h b/include/linux/libata.h index 5ab4e3a76721..92abb497ab14 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -593,6 +593,7 @@ struct ata_host { struct device *dev; void __iomem * const *iomap; unsigned int n_ports; + unsigned int n_tags; /* nr of NCQ tags */ void *private_data; struct ata_port_operations *ops; unsigned long flags; From f98bac5a30b60a2fca854dd5ee7256221d8ccf0a Mon Sep 17 00:00:00 2001 From: Kinglong Mee Date: Mon, 7 Jul 2014 22:10:56 +0800 Subject: [PATCH 364/455] NFSD: Fix crash encoding lock reply on 32-bit Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space" forgot to free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak. Worse, kfree() can be called on an uninitialized pointer in the case of a succesful lock (or one that fails for a reason other than a conflict). (Note that lock->lk_denied.ld_owner.data appears it should be zero here, until you notice that it's one arm of a union the other arm of which is written to in the succesful case by the memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, sizeof(stateid_t)); in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.) Signed-off-by: Kinglong Mee Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space" Signed-off-by: J. Bruce Fields --- fs/nfsd/nfs4xdr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b56b1cc02718..944275c8f56d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2879,6 +2879,7 @@ again: * return the conflicting open: */ if (conf->len) { + kfree(conf->data); conf->len = 0; conf->data = NULL; goto again; @@ -2891,6 +2892,7 @@ again: if (conf->len) { p = xdr_encode_opaque_fixed(p, &ld->ld_clientid, 8); p = xdr_encode_opaque(p, conf->data, conf->len); + kfree(conf->data); } else { /* non - nfsv4 lock in conflict, no clientid nor owner */ p = xdr_encode_hyper(p, (u64)0); /* clientid */ *p++ = cpu_to_be32(0); /* length of owner name */ @@ -2907,7 +2909,7 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid); else if (nfserr == nfserr_denied) nfserr = nfsd4_encode_lock_denied(xdr, &lock->lk_denied); - kfree(lock->lk_denied.ld_owner.data); + return nfserr; } From 2a2261553dd1472ca574acadbd93e12f44c4e6d5 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 22 Jul 2014 15:35:14 +0200 Subject: [PATCH 365/455] x86, cpu: Fix cache topology for early P4-SMT P4 systems with cpuid level < 4 can have SMT, but the cache topology description available (cpuid2) does not include SMP information. Now we know that SMT shares all cache levels, and therefore we can mark all available cache levels as shared. We do this by setting cpu_llc_id to ->phys_proc_id, since that's the same for each SMT thread. We can do this unconditional since if there's no SMT its still true, the one CPU shares cache with only itself. This fixes a problem where such CPUs report an incorrect LLC CPU mask. This in turn fixes a crash in the scheduler where the topology was build wrong, it assumes the LLC mask to include at least the SMT CPUs. Cc: Josh Boyer Cc: Dietmar Eggemann Tested-by: Bruno Wolff III Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/20140722133514.GM12054@laptop.lan Signed-off-by: H. Peter Anvin --- arch/x86/kernel/cpu/intel.c | 22 +++++++++++----------- arch/x86/kernel/cpu/intel_cacheinfo.c | 12 ++++++++++++ 2 files changed, 23 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index a80029035bf2..f9e4fdd3b877 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -370,6 +370,17 @@ static void init_intel(struct cpuinfo_x86 *c) */ detect_extended_topology(c); + if (!cpu_has(c, X86_FEATURE_XTOPOLOGY)) { + /* + * let's use the legacy cpuid vector 0x1 and 0x4 for topology + * detection. + */ + c->x86_max_cores = intel_num_cpu_cores(c); +#ifdef CONFIG_X86_32 + detect_ht(c); +#endif + } + l2 = init_intel_cacheinfo(c); if (c->cpuid_level > 9) { unsigned eax = cpuid_eax(10); @@ -438,17 +449,6 @@ static void init_intel(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_P3); #endif - if (!cpu_has(c, X86_FEATURE_XTOPOLOGY)) { - /* - * let's use the legacy cpuid vector 0x1 and 0x4 for topology - * detection. - */ - c->x86_max_cores = intel_num_cpu_cores(c); -#ifdef CONFIG_X86_32 - detect_ht(c); -#endif - } - /* Work around errata */ srat_detect_node(c); diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c index a952e9c85b6f..9c8f7394c612 100644 --- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -730,6 +730,18 @@ unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c) #endif } +#ifdef CONFIG_X86_HT + /* + * If cpu_llc_id is not yet set, this means cpuid_level < 4 which in + * turns means that the only possibility is SMT (as indicated in + * cpuid1). Since cpuid2 doesn't specify shared caches, and we know + * that SMT shares all caches, we can unconditionally set cpu_llc_id to + * c->phys_proc_id. + */ + if (per_cpu(cpu_llc_id, cpu) == BAD_APICID) + per_cpu(cpu_llc_id, cpu) = c->phys_proc_id; +#endif + c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d)); return l2; From e8c214d22e76dd0ead38f97f8d2dc09aac70d651 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Wed, 23 Jul 2014 09:47:58 +0200 Subject: [PATCH 366/455] drm/radeon: fix irq ring buffer overflow handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We must mask out the overflow bit as well, otherwise the wptr will never match the rptr again and the interrupt handler will loop forever. Signed-off-by: Christian König Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher Reviewed-by: Michel Dänzer --- drivers/gpu/drm/radeon/cik.c | 1 + drivers/gpu/drm/radeon/evergreen.c | 1 + drivers/gpu/drm/radeon/r600.c | 1 + drivers/gpu/drm/radeon/si.c | 1 + 4 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 0b2471107137..cc1f02f6817d 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -7376,6 +7376,7 @@ static inline u32 cik_get_ih_wptr(struct radeon_device *rdev) tmp = RREG32(IH_RB_CNTL); tmp |= IH_WPTR_OVERFLOW_CLEAR; WREG32(IH_RB_CNTL, tmp); + wptr &= ~RB_OVERFLOW; } return (wptr & rdev->ih.ptr_mask); } diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 250bac3935a4..15e4f28015e1 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -4756,6 +4756,7 @@ static u32 evergreen_get_ih_wptr(struct radeon_device *rdev) tmp = RREG32(IH_RB_CNTL); tmp |= IH_WPTR_OVERFLOW_CLEAR; WREG32(IH_RB_CNTL, tmp); + wptr &= ~RB_OVERFLOW; } return (wptr & rdev->ih.ptr_mask); } diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index c66952d4b00c..3c69f58e46ef 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -3795,6 +3795,7 @@ static u32 r600_get_ih_wptr(struct radeon_device *rdev) tmp = RREG32(IH_RB_CNTL); tmp |= IH_WPTR_OVERFLOW_CLEAR; WREG32(IH_RB_CNTL, tmp); + wptr &= ~RB_OVERFLOW; } return (wptr & rdev->ih.ptr_mask); } diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c index eba0225259a4..9e854fd016da 100644 --- a/drivers/gpu/drm/radeon/si.c +++ b/drivers/gpu/drm/radeon/si.c @@ -6103,6 +6103,7 @@ static inline u32 si_get_ih_wptr(struct radeon_device *rdev) tmp = RREG32(IH_RB_CNTL); tmp |= IH_WPTR_OVERFLOW_CLEAR; WREG32(IH_RB_CNTL, tmp); + wptr &= ~RB_OVERFLOW; } return (wptr & rdev->ih.ptr_mask); } From c01fac1c77a00227f706a1654317023e3f4ac7f0 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Wed, 23 Jul 2014 15:40:54 +0200 Subject: [PATCH 367/455] ath9k: fix aggregation session lockup If an aggregation session fails, frames still end up in the driver queue with IEEE80211_TX_CTL_AMPDU set. This causes tx for the affected station/tid to stall, since ath_tx_get_tid_subframe returning packets to send. Fix this by clearing IEEE80211_TX_CTL_AMPDU as long as no aggregation session is running. Cc: stable@vger.kernel.org Reported-by: Antonio Quartulli Signed-off-by: Felix Fietkau Signed-off-by: John W. Linville --- drivers/net/wireless/ath/ath9k/xmit.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index 66acb2cbd9df..7c28cb55610b 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -887,6 +887,15 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq, tx_info = IEEE80211_SKB_CB(skb); tx_info->flags &= ~IEEE80211_TX_CTL_CLEAR_PS_FILT; + + /* + * No aggregation session is running, but there may be frames + * from a previous session or a failed attempt in the queue. + * Send them out as normal data frames + */ + if (!tid->active) + tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU; + if (!(tx_info->flags & IEEE80211_TX_CTL_AMPDU)) { bf->bf_state.bf_type = 0; return bf; From d584a66279949561418c82b12bb4c055e6c25836 Mon Sep 17 00:00:00 2001 From: Stefan Richter Date: Wed, 23 Jul 2014 20:08:12 +0200 Subject: [PATCH 368/455] firewire: ohci: disable MSI for VIA VT6315 again MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Revert half of commit d151f9854f21: If isochronous I/O is attempted with packets larget than 1 kByte, VIA VT6315 rev 01 immediately stops to generate any interrupts if MSI are used. Fix this by going back to legacy interrupts. [Thread "Isochronous streaming with VT6315 OHCI", http://marc.info/?t=139049641500003] With smaller packets, the loss of IRQs happens too but only very rarely --- rarely eneough that it was not yet possible for me to determine whether QUIRK_NO_MSI is an actual fix for this rare variation of this chip bug. I am keeping QUIRK_CYCLE_TIMER off of VT6315 rev >= 1 because this has been verified by myself with certainty. On the other hand, I am also keeping QUIRK_CYCLE_TIMER on for VT6315 rev 0 because I don't know at this time whether this revision accesses Cycle Timer non-atomically like most of the other VIA OHCIs are known to do. Reported-by: Rémy Bruno Signed-off-by: Stefan Richter --- drivers/firewire/ohci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c index 5b0934d0d968..41df806e9e80 100644 --- a/drivers/firewire/ohci.c +++ b/drivers/firewire/ohci.c @@ -336,10 +336,10 @@ static const struct { QUIRK_CYCLE_TIMER | QUIRK_IR_WAKE}, {PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_VT6315, 0, - QUIRK_CYCLE_TIMER | QUIRK_NO_MSI}, + QUIRK_CYCLE_TIMER /* FIXME: necessary? */ | QUIRK_NO_MSI}, {PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_VT6315, PCI_ANY_ID, - 0}, + QUIRK_NO_MSI}, {PCI_VENDOR_ID_VIA, PCI_ANY_ID, PCI_ANY_ID, QUIRK_CYCLE_TIMER | QUIRK_NO_MSI}, From 332cfc823d182d43bfcc80d1ebee7ab79e06ccf3 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Wed, 23 Jul 2014 08:59:40 +0800 Subject: [PATCH 369/455] amd-xgbe: Fix error return code in xgbe_probe() Fix to return a negative error code from the setting real tx queue count error handling case instead of 0. Signed-off-by: Wei Yongjun Acked-by: Tom Lendacky Signed-off-by: David S. Miller --- drivers/net/ethernet/amd/xgbe/xgbe-main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c index c83584a26713..5a1891faba8a 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c @@ -339,7 +339,8 @@ static int xgbe_probe(struct platform_device *pdev) /* Calculate the number of Tx and Rx rings to be created */ pdata->tx_ring_count = min_t(unsigned int, num_online_cpus(), pdata->hw_feat.tx_ch_cnt); - if (netif_set_real_num_tx_queues(netdev, pdata->tx_ring_count)) { + ret = netif_set_real_num_tx_queues(netdev, pdata->tx_ring_count); + if (ret) { dev_err(dev, "error setting real tx queue count\n"); goto err_io; } From dd1d3f8f9920926aa426589e542eed6bf58b7354 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Wed, 23 Jul 2014 09:00:35 +0800 Subject: [PATCH 370/455] hyperv: Fix error return code in netvsc_init_buf() Fix to return -ENOMEM from the kalloc error handling case instead of 0. Signed-off-by: Wei Yongjun Reviewed-by: Haiyang Zhang Signed-off-by: David S. Miller --- drivers/net/hyperv/netvsc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 4ed38eaecea8..d97d5f39a04e 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -378,8 +378,10 @@ static int netvsc_init_buf(struct hv_device *device) net_device->send_section_map = kzalloc(net_device->map_words * sizeof(ulong), GFP_KERNEL); - if (net_device->send_section_map == NULL) + if (net_device->send_section_map == NULL) { + ret = -ENOMEM; goto cleanup; + } goto exit; From aed8adb7688d5744cb484226820163af31d2499a Mon Sep 17 00:00:00 2001 From: Silesh C V Date: Wed, 23 Jul 2014 13:59:59 -0700 Subject: [PATCH 371/455] coredump: fix the setting of PF_DUMPCORE Commit 079148b919d0 ("coredump: factor out the setting of PF_DUMPCORE") cleaned up the setting of PF_DUMPCORE by removing it from all the linux_binfmt->core_dump() and moving it to zap_threads().But this ended up clearing all the previously set flags. This causes issues during core generation when tsk->flags is checked again (eg. for PF_USED_MATH to dump floating point registers). Fix this. Signed-off-by: Silesh C V Acked-by: Oleg Nesterov Cc: Mandeep Singh Baines Cc: [3.10+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coredump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/coredump.c b/fs/coredump.c index 0b2528fb640e..a93f7e6ea4cf 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -306,7 +306,7 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, if (unlikely(nr < 0)) return nr; - tsk->flags = PF_DUMPCORE; + tsk->flags |= PF_DUMPCORE; if (atomic_read(&mm->mm_users) == nr + 1) goto done; /* From a0f7a756c2f7543585657cdeeefdfcc11b567293 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Wed, 23 Jul 2014 14:00:01 -0700 Subject: [PATCH 372/455] mm/rmap.c: fix pgoff calculation to handle hugepage correctly I triggered VM_BUG_ON() in vma_address() when I tried to migrate an anonymous hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's calculation in rmap_walk_anon() fails to consider compound_order() only to have an incorrect value. This patch introduces page_to_pgoff(), which gets the page's offset in PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively handle hugepages, and in order to make hugetlbfs fit it, page->index of hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch, but page_to_pgoff() contains the point to be fixed in a single function. Signed-off-by: Naoya Horiguchi Acked-by: Kirill A. Shutemov Cc: Joonsoo Kim Cc: Hugh Dickins Cc: Rik van Riel Cc: Hillf Danton Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/pagemap.h | 12 ++++++++++++ mm/memory-failure.c | 4 ++-- mm/rmap.c | 10 +++------- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 0a97b583ee8d..e1474ae18c88 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -398,6 +398,18 @@ static inline struct page *read_mapping_page(struct address_space *mapping, return read_cache_page(mapping, index, filler, data); } +/* + * Get the offset in PAGE_SIZE. + * (TODO: hugepage should have ->index in PAGE_SIZE) + */ +static inline pgoff_t page_to_pgoff(struct page *page) +{ + if (unlikely(PageHeadHuge(page))) + return page->index << compound_order(page); + else + return page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); +} + /* * Return byte-offset into filesystem object for page. */ diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c6399e328931..7211a73ba14d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -435,7 +435,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, if (av == NULL) /* Not actually mapped anymore */ return; - pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff = page_to_pgoff(page); read_lock(&tasklist_lock); for_each_process (tsk) { struct anon_vma_chain *vmac; @@ -469,7 +469,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, mutex_lock(&mapping->i_mmap_mutex); read_lock(&tasklist_lock); for_each_process(tsk) { - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff_t pgoff = page_to_pgoff(page); struct task_struct *t = task_early_kill(tsk, force_early); if (!t) diff --git a/mm/rmap.c b/mm/rmap.c index b7e94ebbd09e..22a4a7699cdb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -517,11 +517,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma) static inline unsigned long __vma_address(struct page *page, struct vm_area_struct *vma) { - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); - - if (unlikely(is_vm_hugetlb_page(vma))) - pgoff = page->index << huge_page_order(page_hstate(page)); - + pgoff_t pgoff = page_to_pgoff(page); return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); } @@ -1639,7 +1635,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page, static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc) { struct anon_vma *anon_vma; - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + pgoff_t pgoff = page_to_pgoff(page); struct anon_vma_chain *avc; int ret = SWAP_AGAIN; @@ -1680,7 +1676,7 @@ static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc) static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc) { struct address_space *mapping = page->mapping; - pgoff_t pgoff = page->index << compound_order(page); + pgoff_t pgoff = page_to_pgoff(page); struct vm_area_struct *vma; int ret = SWAP_AGAIN; From b4c5c60920e3b0c4598f43e7317559f6aec51531 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 23 Jul 2014 14:00:04 -0700 Subject: [PATCH 373/455] zram: avoid lockdep splat by revalidate_disk Sasha reported lockdep warning [1] introduced by [2]. It could be fixed by doing disk revalidation out of the init_lock. It's okay because disk capacity change is protected by init_lock so that revalidate_disk always sees up-to-date value so there is no race. [1] https://lkml.org/lkml/2014/7/3/735 [2] zram: revalidate disk after capacity change Fixes 2e32baea46ce ("zram: revalidate disk after capacity change"). Signed-off-by: Minchan Kim Reported-by: Sasha Levin Cc: "Alexander E. Patrakov" Cc: Nitin Gupta Cc: Jerome Marchand Cc: Sergey Senozhatsky CC: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/block/zram/zram_drv.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 089e72cd37be..36e54be402df 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -622,11 +622,18 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) memset(&zram->stats, 0, sizeof(zram->stats)); zram->disksize = 0; - if (reset_capacity) { + if (reset_capacity) set_capacity(zram->disk, 0); - revalidate_disk(zram->disk); - } + up_write(&zram->init_lock); + + /* + * Revalidate disk out of the init_lock to avoid lockdep splat. + * It's okay because disk's capacity is protected by init_lock + * so that revalidate_disk always sees up-to-date capacity. + */ + if (reset_capacity) + revalidate_disk(zram->disk); } static ssize_t disksize_store(struct device *dev, @@ -666,8 +673,15 @@ static ssize_t disksize_store(struct device *dev, zram->comp = comp; zram->disksize = disksize; set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT); - revalidate_disk(zram->disk); up_write(&zram->init_lock); + + /* + * Revalidate disk out of the init_lock to avoid lockdep splat. + * It's okay because disk's capacity is protected by init_lock + * so that revalidate_disk always sees up-to-date capacity. + */ + revalidate_disk(zram->disk); + return len; out_destroy_comp: From b1923b55af43a6febb976084bf30d1a4797c92c9 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Wed, 23 Jul 2014 14:00:06 -0700 Subject: [PATCH 374/455] sh: also try passing -m4-nofpu for SH2A builds When compiling a SH2A kernel (e.g. se7206_defconfig or rsk7203_defconfig) using sh4-linux-gcc, linking fails with: net/built-in.o: In function `__sk_run_filter': net/core/filter.c:566: undefined reference to `__fpscr_values' net/core/filter.c:269: undefined reference to `__fpscr_values' ... net/built-in.o:net/core/filter.c:580: more undefined references to `__fpscr_values' follow This happens because sh4-linux-gcc doesn't support the "-m2a-nofpu", which is thus filtered out by "$(call cc-option, ...)". As compiling using sh4-linux-gcc is useful for compile coverage, also try passing "-m4-nofpu" (which is presumably filtered out when using a real sh2a-linux toolchain) to disable the generation of FPU instructions and references to __fpscr_values[]. Signed-off-by: Geert Uytterhoeven Cc: Guenter Roeck Cc: Tony Breeds Cc: Alexei Starovoitov Cc: Fengguang Wu Cc: Daniel Borkmann Cc: Magnus Damm Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/sh/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/sh/Makefile b/arch/sh/Makefile index d4d16e4be07c..bf5b3f5f4962 100644 --- a/arch/sh/Makefile +++ b/arch/sh/Makefile @@ -32,7 +32,8 @@ endif cflags-$(CONFIG_CPU_SH2) := $(call cc-option,-m2,) cflags-$(CONFIG_CPU_SH2A) += $(call cc-option,-m2a,) \ - $(call cc-option,-m2a-nofpu,) + $(call cc-option,-m2a-nofpu,) \ + $(call cc-option,-m4-nofpu,) cflags-$(CONFIG_CPU_SH3) := $(call cc-option,-m3,) cflags-$(CONFIG_CPU_SH4) := $(call cc-option,-m4,) \ $(call cc-option,-mno-implicit-fp,-m4-nofpu) From c118678bc79e8241f9d3434d9324c6400d72f48a Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Wed, 23 Jul 2014 14:00:08 -0700 Subject: [PATCH 375/455] mm: do not call do_fault_around for non-linear fault Ingo Korb reported that "repeated mapping of the same file on tmpfs using remap_file_pages sometimes triggers a BUG at mm/filemap.c:202 when the process exits". He bisected the bug to d7c1755179b8 ("mm: implement ->map_pages for shmem/tmpfs"), although the bug was actually added by commit 8c6e50b0290c ("mm: introduce vm_ops->map_pages()"). The problem is caused by calling do_fault_around for a _non-linear_ fault. In this case pgoff is shifted and might become negative during calculation. Faulting around non-linear page-fault makes no sense and breaks the logic in do_fault_around because pgoff is shifted. Signed-off-by: Konstantin Khlebnikov Reported-by: Ingo Korb Tested-by: Ingo Korb Cc: Hugh Dickins Cc: Sasha Levin Cc: Dave Jones Cc: Ning Qu Cc: "Kirill A. Shutemov" Cc: [3.15.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index d67fd9fcf1f2..7e8d8205b610 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2882,7 +2882,8 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages && fault_around_pages() > 1) { + if (vma->vm_ops->map_pages && !(flags & FAULT_FLAG_NONLINEAR) && + fault_around_pages() > 1) { pte = pte_offset_map_lock(mm, pmd, address, &ptl); do_fault_around(vma, address, pte, pgoff, flags); if (!pte_same(*pte, orig_pte)) From 8e205f779d1443a94b5ae81aa359cb535dd3021e Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 23 Jul 2014 14:00:10 -0700 Subject: [PATCH 376/455] shmem: fix faulting into a hole, not taking i_mutex Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's punched") was buggy: Sasha sent a lockdep report to remind us that grabbing i_mutex in the fault path is a no-no (write syscall may already hold i_mutex while faulting user buffer). We tried a completely different approach (see following patch) but that proved inadequate: good enough for a rational workload, but not good enough against trinity - which forks off so many mappings of the object that contention on i_mmap_mutex while hole-puncher holds i_mutex builds into serious starvation when concurrent faults force the puncher to fall back to single-page unmap_mapping_range() searches of the i_mmap tree. So return to the original umbrella approach, but keep away from i_mutex this time. We really don't want to bloat every shmem inode with a new mutex or completion, just to protect this unlikely case from trinity. So extend the original with wait_queue_head on stack at the hole-punch end, and wait_queue item on the stack at the fault end. This involves further use of i_lock to guard against the races: lockdep has been happy so far, and I see fs/inode.c:unlock_new_inode() holds i_lock around wake_up_bit(), which is comparable to what we do here. i_lock is more convenient, but we could switch to shmem's info->lock. This issue has been tagged with CVE-2014-4171, which will require commit f00cdc6df7d7 and this and the following patch to be backported: we suggest to 3.1+, though in fact the trinity forkbomb effect might go back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might not, since much has changed, with i_mmap_mutex a spinlock before 3.0. Anyone running trinity on 3.0 and earlier? I don't think we need care. Signed-off-by: Hugh Dickins Reported-by: Sasha Levin Tested-by: Sasha Levin Cc: Vlastimil Babka Cc: Konstantin Khlebnikov Cc: Johannes Weiner Cc: Lukas Czerner Cc: Dave Jones Cc: [3.1+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/shmem.c | 78 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 52 insertions(+), 26 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 1140f49b6ded..c0719f082246 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -85,7 +85,7 @@ static struct vfsmount *shm_mnt; * a time): we would prefer not to enlarge the shmem inode just for that. */ struct shmem_falloc { - int mode; /* FALLOC_FL mode currently operating */ + wait_queue_head_t *waitq; /* faults into hole wait for punch to end */ pgoff_t start; /* start of range currently being fallocated */ pgoff_t next; /* the next page offset to be fallocated */ pgoff_t nr_falloced; /* how many new pages have been fallocated */ @@ -760,7 +760,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) spin_lock(&inode->i_lock); shmem_falloc = inode->i_private; if (shmem_falloc && - !shmem_falloc->mode && + !shmem_falloc->waitq && index >= shmem_falloc->start && index < shmem_falloc->next) shmem_falloc->nr_unswapped++; @@ -1248,38 +1248,58 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) * Trinity finds that probing a hole which tmpfs is punching can * prevent the hole-punch from ever completing: which in turn * locks writers out with its hold on i_mutex. So refrain from - * faulting pages into the hole while it's being punched, and - * wait on i_mutex to be released if vmf->flags permits. + * faulting pages into the hole while it's being punched. Although + * shmem_undo_range() does remove the additions, it may be unable to + * keep up, as each new page needs its own unmap_mapping_range() call, + * and the i_mmap tree grows ever slower to scan if new vmas are added. + * + * It does not matter if we sometimes reach this check just before the + * hole-punch begins, so that one fault then races with the punch: + * we just need to make racing faults a rare case. + * + * The implementation below would be much simpler if we just used a + * standard mutex or completion: but we cannot take i_mutex in fault, + * and bloating every shmem inode for this unlikely case would be sad. */ if (unlikely(inode->i_private)) { struct shmem_falloc *shmem_falloc; spin_lock(&inode->i_lock); shmem_falloc = inode->i_private; - if (!shmem_falloc || - shmem_falloc->mode != FALLOC_FL_PUNCH_HOLE || - vmf->pgoff < shmem_falloc->start || - vmf->pgoff >= shmem_falloc->next) - shmem_falloc = NULL; - spin_unlock(&inode->i_lock); - /* - * i_lock has protected us from taking shmem_falloc seriously - * once return from shmem_fallocate() went back up that stack. - * i_lock does not serialize with i_mutex at all, but it does - * not matter if sometimes we wait unnecessarily, or sometimes - * miss out on waiting: we just need to make those cases rare. - */ - if (shmem_falloc) { + if (shmem_falloc && + shmem_falloc->waitq && + vmf->pgoff >= shmem_falloc->start && + vmf->pgoff < shmem_falloc->next) { + wait_queue_head_t *shmem_falloc_waitq; + DEFINE_WAIT(shmem_fault_wait); + + ret = VM_FAULT_NOPAGE; if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { + /* It's polite to up mmap_sem if we can */ up_read(&vma->vm_mm->mmap_sem); - mutex_lock(&inode->i_mutex); - mutex_unlock(&inode->i_mutex); - return VM_FAULT_RETRY; + ret = VM_FAULT_RETRY; } - /* cond_resched? Leave that to GUP or return to user */ - return VM_FAULT_NOPAGE; + + shmem_falloc_waitq = shmem_falloc->waitq; + prepare_to_wait(shmem_falloc_waitq, &shmem_fault_wait, + TASK_UNINTERRUPTIBLE); + spin_unlock(&inode->i_lock); + schedule(); + + /* + * shmem_falloc_waitq points into the shmem_fallocate() + * stack of the hole-punching task: shmem_falloc_waitq + * is usually invalid by the time we reach here, but + * finish_wait() does not dereference it in that case; + * though i_lock needed lest racing with wake_up_all(). + */ + spin_lock(&inode->i_lock); + finish_wait(shmem_falloc_waitq, &shmem_fault_wait); + spin_unlock(&inode->i_lock); + return ret; } + spin_unlock(&inode->i_lock); } error = shmem_getpage(inode, vmf->pgoff, &vmf->page, SGP_CACHE, &ret); @@ -1774,13 +1794,13 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, mutex_lock(&inode->i_mutex); - shmem_falloc.mode = mode & ~FALLOC_FL_KEEP_SIZE; - if (mode & FALLOC_FL_PUNCH_HOLE) { struct address_space *mapping = file->f_mapping; loff_t unmap_start = round_up(offset, PAGE_SIZE); loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq); + shmem_falloc.waitq = &shmem_falloc_waitq; shmem_falloc.start = unmap_start >> PAGE_SHIFT; shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT; spin_lock(&inode->i_lock); @@ -1792,8 +1812,13 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, 1 + unmap_end - unmap_start, 0); shmem_truncate_range(inode, offset, offset + len - 1); /* No need to unmap again: hole-punching leaves COWed pages */ + + spin_lock(&inode->i_lock); + inode->i_private = NULL; + wake_up_all(&shmem_falloc_waitq); + spin_unlock(&inode->i_lock); error = 0; - goto undone; + goto out; } /* We need to check rlimit even when FALLOC_FL_KEEP_SIZE */ @@ -1809,6 +1834,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto out; } + shmem_falloc.waitq = NULL; shmem_falloc.start = start; shmem_falloc.next = start; shmem_falloc.nr_falloced = 0; From b1a366500bd537b50c3aad26dc7df083ec03a448 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 23 Jul 2014 14:00:13 -0700 Subject: [PATCH 377/455] shmem: fix splicing from a hole while it's punched shmem_fault() is the actual culprit in trinity's hole-punch starvation, and the most significant cause of such problems: since a page faulted is one that then appears page_mapped(), needing unmap_mapping_range() and i_mmap_mutex to be unmapped again. But it is not the only way in which a page can be brought into a hole in the radix_tree while that hole is being punched; and Vlastimil's testing implies that if enough other processors are busy filling in the hole, then shmem_undo_range() can be kept from completing indefinitely. shmem_file_splice_read() is the main other user of SGP_CACHE, which can instantiate shmem pagecache pages in the read-only case (without holding i_mutex, so perhaps concurrently with a hole-punch). Probably it's silly not to use SGP_READ already (using the ZERO_PAGE for holes): which ought to be safe, but might bring surprises - not a change to be rushed. shmem_read_mapping_page_gfp() is an internal interface used by drivers/gpu/drm GEM (and next by uprobes): it should be okay. And shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when called internally by the kernel (perhaps for a stacking filesystem, which might rely on holes to be reserved): it's unclear whether it could be provoked to keep hole-punch busy or not. We could apply the same umbrella as now used in shmem_fault() to shmem_file_splice_read() and the others; but it looks ugly, and use over a range raises questions - should it actually be per page? can these get starved themselves? The origin of this part of the problem is my v3.1 commit d0823576bf4b ("mm: pincer in truncate_inode_pages_range"), once it was duplicated into shmem.c. It seemed like a nice idea at the time, to ensure (barring RCU lookup fuzziness) that there's an instant when the entire hole is empty; but the indefinitely repeated scans to ensure that make it vulnerable. Revert that "enhancement" to hole-punch from shmem_undo_range(), but retain the unproblematic rescanning when it's truncating; add a couple of comments there. Remove the "indices[0] >= end" test: that is now handled satisfactorily by the inner loop, and mem_cgroup_uncharge_start()/end() are too light to be worth avoiding here. But if we do not always loop indefinitely, we do need to handle the case of swap swizzled back to page before shmem_free_swap() gets it: add a retry for that case, as suggested by Konstantin Khlebnikov; and for the case of page swizzled back to swap, as suggested by Johannes Weiner. Signed-off-by: Hugh Dickins Reported-by: Sasha Levin Suggested-by: Vlastimil Babka Cc: Konstantin Khlebnikov Cc: Johannes Weiner Cc: Lukas Czerner Cc: Dave Jones Cc: [3.1+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/shmem.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index c0719f082246..af68b15a8fc1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -468,23 +468,20 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, return; index = start; - for ( ; ; ) { + while (index < end) { cond_resched(); pvec.nr = find_get_entries(mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), pvec.pages, indices); if (!pvec.nr) { - if (index == start || unfalloc) + /* If all gone or hole-punch or unfalloc, we're done */ + if (index == start || end != -1) break; + /* But if truncating, restart to make sure all gone */ index = start; continue; } - if ((index == start || unfalloc) && indices[0] >= end) { - pagevec_remove_exceptionals(&pvec); - pagevec_release(&pvec); - break; - } mem_cgroup_uncharge_start(); for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; @@ -496,8 +493,12 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, if (radix_tree_exceptional_entry(page)) { if (unfalloc) continue; - nr_swaps_freed += !shmem_free_swap(mapping, - index, page); + if (shmem_free_swap(mapping, index, page)) { + /* Swap was replaced by page: retry */ + index--; + break; + } + nr_swaps_freed++; continue; } @@ -506,6 +507,11 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, if (page->mapping == mapping) { VM_BUG_ON_PAGE(PageWriteback(page), page); truncate_inode_page(mapping, page); + } else { + /* Page was replaced by swap: retry */ + unlock_page(page); + index--; + break; } } unlock_page(page); From 792ceaefe62189e3beea612ec0a052e42a81e993 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 23 Jul 2014 14:00:15 -0700 Subject: [PATCH 378/455] mm/fs: fix pessimization in hole-punching pagecache I wanted to revert my v3.1 commit d0823576bf4b ("mm: pincer in truncate_inode_pages_range"), to keep truncate_inode_pages_range() in synch with shmem_undo_range(); but have stepped back - a change to hole-punching in truncate_inode_pages_range() is a change to hole-punching in every filesystem (except tmpfs) that supports it. If there's a logical proof why no filesystem can depend for its own correctness on the pincer guarantee in truncate_inode_pages_range() - an instant when the entire hole is removed from pagecache - then let's revisit later. But the evidence is that only tmpfs suffered from the livelock, and we have no intention of extending hole-punch to ramfs. So for now just add a few comments (to match or differ from those in shmem_undo_range()), and fix one silliness noticed in d0823576bf4b... Its "index == start" addition to the hole-punch termination test was incomplete: it opened a way for the end condition to be missed, and the loop go on looking through the radix_tree, all the way to end of file. Fix that pessimization by resetting index when detected in inner loop. Note that it's actually hard to hit this case, without the obsessive concurrent faulting that trinity does: normally all pages are removed in the initial trylock_page() pass, and this loop finds nothing to do. I had to "#if 0" out the initial pass to reproduce bug and test fix. Signed-off-by: Hugh Dickins Cc: Sasha Levin Cc: Konstantin Khlebnikov Cc: Lukas Czerner Cc: Dave Jones Acked-by: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/truncate.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 6a78c814bebf..eda247307164 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -355,14 +355,16 @@ void truncate_inode_pages_range(struct address_space *mapping, for ( ; ; ) { cond_resched(); if (!pagevec_lookup_entries(&pvec, mapping, index, - min(end - index, (pgoff_t)PAGEVEC_SIZE), - indices)) { + min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { + /* If all gone from start onwards, we're done */ if (index == start) break; + /* Otherwise restart to make sure all gone */ index = start; continue; } if (index == start && indices[0] >= end) { + /* All gone out of hole to be punched, we're done */ pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); break; @@ -373,8 +375,11 @@ void truncate_inode_pages_range(struct address_space *mapping, /* We rely upon deletion not changing page->index */ index = indices[i]; - if (index >= end) + if (index >= end) { + /* Restart punch to make sure all gone */ + index = start - 1; break; + } if (radix_tree_exceptional_entry(page)) { clear_exceptional_entry(mapping, index, page); From 4e66d445d0421a159135572a0ba44b75c7c4adfa Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 23 Jul 2014 14:00:17 -0700 Subject: [PATCH 379/455] simple_xattr: permit 0-size extended attributes If a filesystem uses simple_xattr to support user extended attributes, LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes values of zero size. Fix that off-by-one (but apparently no filesystem needs them yet). Signed-off-by: Hugh Dickins Cc: Al Viro Cc: Jeff Layton Cc: Aristeu Rozanski Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xattr.c b/fs/xattr.c index 3377dff18404..c69e6d43a0d2 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -843,7 +843,7 @@ struct simple_xattr *simple_xattr_alloc(const void *value, size_t size) /* wrap around? */ len = sizeof(*new_xattr) + size; - if (len <= sizeof(*new_xattr)) + if (len < sizeof(*new_xattr)) return NULL; new_xattr = kmalloc(len, GFP_KERNEL); From 0253d634e0803a8376a0d88efee0bf523d8673f9 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Wed, 23 Jul 2014 14:00:19 -0700 Subject: [PATCH 380/455] mm: hugetlb: fix copy_hugetlb_page_range() Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry") changed the order of huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage in some workloads like hugepage-backed heap allocation via libhugetlbfs. This patch fixes it. The test program for the problem is shown below: $ cat heap.c #include #include #include #define HPS 0x200000 int main() { int i; char *p = malloc(HPS); memset(p, '1', HPS); for (i = 0; i < 5; i++) { if (!fork()) { memset(p, '2', HPS); p = malloc(HPS); memset(p, '3', HPS); free(p); return 0; } } sleep(1); free(p); return 0; } $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry"), so is applicable to -stable kernels which include it. Signed-off-by: Naoya Horiguchi Reported-by: Guillaume Morin Suggested-by: Guillaume Morin Acked-by: Hugh Dickins Cc: [2.6.37+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2024bbd573d2..9221c02ed9e2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2604,6 +2604,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { if (cow) huge_ptep_set_wrprotect(src, addr, src_pte); + entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); get_page(ptepage); page_dup_rmap(ptepage); From f723aa1817dd8f4fe005aab52ba70c8ab0ef9457 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Wed, 23 Jul 2014 21:03:50 -0700 Subject: [PATCH 381/455] sched_clock: Avoid corrupting hrtimer tree during suspend During suspend we call sched_clock_poll() to update the epoch and accumulated time and reprogram the sched_clock_timer to fire before the next wrap-around time. Unfortunately, sched_clock_poll() doesn't restart the timer, instead it relies on the hrtimer layer to do that and during suspend we aren't calling that function from the hrtimer layer. Instead, we're reprogramming the expires time while the hrtimer is enqueued, which can cause the hrtimer tree to be corrupted. Furthermore, we restart the timer during suspend but we update the epoch during resume which seems counter-intuitive. Let's fix this by saving the accumulated state and canceling the timer during suspend. On resume we can update the epoch and restart the timer similar to what we would do if we were starting the clock for the first time. Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer" Signed-off-by: Stephen Boyd Signed-off-by: John Stultz Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org Cc: Ingo Molnar Cc: stable Signed-off-by: Thomas Gleixner --- kernel/time/sched_clock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 445106d2c729..01d2d15aa662 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -191,7 +191,8 @@ void __init sched_clock_postinit(void) static int sched_clock_suspend(void) { - sched_clock_poll(&sched_clock_timer); + update_sched_clock(); + hrtimer_cancel(&sched_clock_timer); cd.suspended = true; return 0; } @@ -199,6 +200,7 @@ static int sched_clock_suspend(void) static void sched_clock_resume(void) { cd.epoch_cyc = read_sched_clock(); + hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL); cd.suspended = false; } From 6fcc5420bfb91049a318bb4d88fe471248b5b391 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh Date: Sun, 20 Jul 2014 12:09:04 +0300 Subject: [PATCH 382/455] direct-io: fix uninitialized warning in do_direct_IO() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The following warnings: fs/direct-io.c: In function ‘__blockdev_direct_IO’: fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:16: note: ‘to’ was declared here fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:10: note: ‘from’ was declared here are false positive because dio_get_page() either fails, or sets both 'from' and 'to'. Paul Bolle said ... Maybe it's better to move initializing "to" and "from" out of dio_get_page(). That _might_ make it easier for both the the reader and the compiler to understand what's going on. Something like this: Christoph Hellwig said ... The fix of moving the code definitively looks nicer, while I think uninitialized_var is horrible wart that won't get anywhere near my code. Boaz Harrosh: I agree with Christoph and Paul Signed-off-by: Boaz Harrosh Signed-off-by: Christoph Hellwig --- fs/direct-io.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 98040ba388ac..194d0d122cae 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -198,9 +198,8 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) * L1 cache. */ static inline struct page *dio_get_page(struct dio *dio, - struct dio_submit *sdio, size_t *from, size_t *to) + struct dio_submit *sdio) { - int n; if (dio_pages_present(sdio) == 0) { int ret; @@ -209,10 +208,7 @@ static inline struct page *dio_get_page(struct dio *dio, return ERR_PTR(ret); BUG_ON(dio_pages_present(sdio) == 0); } - n = sdio->head++; - *from = n ? 0 : sdio->from; - *to = (n == sdio->tail - 1) ? sdio->to : PAGE_SIZE; - return dio->pages[n]; + return dio->pages[sdio->head]; } /** @@ -911,11 +907,15 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, while (sdio->block_in_file < sdio->final_block_in_request) { struct page *page; size_t from, to; - page = dio_get_page(dio, sdio, &from, &to); + + page = dio_get_page(dio, sdio); if (IS_ERR(page)) { ret = PTR_ERR(page); goto out; } + from = sdio->head ? 0 : sdio->from; + to = (sdio->head == sdio->tail - 1) ? sdio->to : PAGE_SIZE; + sdio->head++; while (from < to) { unsigned this_chunk_bytes; /* # of bytes mapped */ From 295dc39d941dc2ae53d5c170365af4c9d5c16212 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Mon, 21 Jul 2014 12:30:23 +0400 Subject: [PATCH 383/455] fs: umount on symlink leaks mnt count Currently umount on symlink blocks following umount: /vz is separate mount # ls /vz/ -al | grep test drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir # umount -l /vz/testlink umount: /vz/testlink: not mounted (expected) # lsof /vz # umount /vz umount: /vz: device is busy. (unexpected) In this case mountpoint_last() gets an extra refcount on path->mnt Signed-off-by: Vasily Averin Acked-by: Ian Kent Acked-by: Jeff Layton Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig --- fs/namei.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 985c6f368485..9eb787e5c167 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2256,9 +2256,10 @@ done: goto out; } path->dentry = dentry; - path->mnt = mntget(nd->path.mnt); + path->mnt = nd->path.mnt; if (should_follow_link(dentry, nd->flags & LOOKUP_FOLLOW)) return 1; + mntget(path->mnt); follow_mount(path); error = 0; out: From 043572d5444116b9d9ad8ae763cf069e7accbc30 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Fri, 18 Jul 2014 07:31:18 -0700 Subject: [PATCH 384/455] hwmon: (smsc47m192) Fix temperature limit and vrm write operations Temperature limit clamps are applied after converting the temperature from milli-degrees C to degrees C, so either the clamp limit needs to be specified in degrees C, not milli-degrees C, or clamping must happen before converting to degrees C. Use the latter method to avoid overflows. vrm is an u8, so the written value needs to be limited to [0, 255]. Cc: Axel Lin Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck Reviewed-by: Jean Delvare --- drivers/hwmon/smsc47m192.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/smsc47m192.c b/drivers/hwmon/smsc47m192.c index efee4c59239f..34b9a601ad07 100644 --- a/drivers/hwmon/smsc47m192.c +++ b/drivers/hwmon/smsc47m192.c @@ -86,7 +86,7 @@ static inline u8 IN_TO_REG(unsigned long val, int n) */ static inline s8 TEMP_TO_REG(int val) { - return clamp_val(SCALE(val, 1, 1000), -128000, 127000); + return SCALE(clamp_val(val, -128000, 127000), 1, 1000); } static inline int TEMP_FROM_REG(s8 val) @@ -384,6 +384,8 @@ static ssize_t set_vrm(struct device *dev, struct device_attribute *attr, err = kstrtoul(buf, 10, &val); if (err) return err; + if (val > 255) + return -EINVAL; data->vrm = val; return count; From 91942d17667a7e7976d3e8bc2f18a34c12bed9fc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Wed, 23 Jul 2014 20:37:44 +0100 Subject: [PATCH 385/455] ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This fixes the following warning: warning: (ARCH_MULTIPLATFORM && ARCH_INTEGRATOR && ARCH_SHMOBILE_LEGACY) selects ARM_PATCH_PHYS_VIRT which has unmet direct dependencies (!XIP_KERNEL && MMU && (!ARCH_REALVIEW || !SPARSEMEM)) Signed-off-by: Uwe Kleine-König Signed-off-by: Russell King --- arch/arm/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 245058b3b0ef..7a14af6e3d2c 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -312,7 +312,7 @@ config ARCH_MULTIPLATFORM config ARCH_INTEGRATOR bool "ARM Ltd. Integrator family" select ARM_AMBA - select ARM_PATCH_PHYS_VIRT + select ARM_PATCH_PHYS_VIRT if MMU select AUTO_ZRELADDR select COMMON_CLK select COMMON_CLK_VERSATILE @@ -658,7 +658,7 @@ config ARCH_MSM config ARCH_SHMOBILE_LEGACY bool "Renesas ARM SoCs (non-multiplatform)" select ARCH_SHMOBILE - select ARM_PATCH_PHYS_VIRT + select ARM_PATCH_PHYS_VIRT if MMU select CLKDEV_LOOKUP select GENERIC_CLOCKEVENTS select HAVE_ARM_SCU if SMP From 20dbea494543aefaace874cc3ec93a39b94b1ec4 Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Wed, 23 Jul 2014 19:44:12 -0400 Subject: [PATCH 386/455] parisc: Remove SA_RESTORER define The sa_restorer field in struct sigaction is obsolete and no longer in the parisc implementation. However, the core code assumes the field is present if SA_RESTORER is defined. So, the define needs to be removed. Signed-off-by: John David Anglin Cc: Signed-off-by: Helge Deller --- arch/parisc/include/uapi/asm/signal.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/parisc/include/uapi/asm/signal.h b/arch/parisc/include/uapi/asm/signal.h index a2fa297196bc..f5645d6a89f2 100644 --- a/arch/parisc/include/uapi/asm/signal.h +++ b/arch/parisc/include/uapi/asm/signal.h @@ -69,8 +69,6 @@ #define SA_NOMASK SA_NODEFER #define SA_ONESHOT SA_RESETHAND -#define SA_RESTORER 0x04000000 /* obsolete -- ignored */ - #define MINSIGSTKSZ 2048 #define SIGSTKSZ 8192 From 9794144d5a95ca90cb9165a0aae1af155f1d8676 Mon Sep 17 00:00:00 2001 From: HIMANGI SARAOGI Date: Sat, 19 Jul 2014 17:07:41 +0530 Subject: [PATCH 387/455] parisc: Eliminate memset after alloc_bootmem_pages alloc_bootmem and related function always return zeroed region of memory. Thus a memset after calls to these functions is unnecessary. The following Coccinelle semantic patch was used for making the change: @@ expression E,E1; @@ E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...) ... when != E - memset(E,0,E1); Signed-off-by: Himangi Saraogi Acked-by: Julia Lawall Signed-off-by: Helge Deller --- arch/parisc/mm/init.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index ae085ad0fba0..0bef864264c0 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -728,7 +728,6 @@ static void __init pagetable_init(void) #endif empty_zero_page = alloc_bootmem_pages(PAGE_SIZE); - memset(empty_zero_page, 0, PAGE_SIZE); } static void __init gateway_init(void) From 6cff1f6ad4c615319c1a146b2aa0af1043c5e9f5 Mon Sep 17 00:00:00 2001 From: Malcolm Priestley Date: Wed, 23 Jul 2014 21:35:11 +0100 Subject: [PATCH 388/455] staging: vt6655: Fix Warning on boot handle_irq_event_percpu. WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0() irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts Using spin_lock_irqsave appears to fix this. Signed-off-by: Malcolm Priestley Cc: Signed-off-by: Greg Kroah-Hartman --- drivers/staging/vt6655/device_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/staging/vt6655/device_main.c b/drivers/staging/vt6655/device_main.c index 1d3908d044d0..5a5fd937a442 100644 --- a/drivers/staging/vt6655/device_main.c +++ b/drivers/staging/vt6655/device_main.c @@ -2318,6 +2318,7 @@ static irqreturn_t device_intr(int irq, void *dev_instance) { int handled = 0; unsigned char byData = 0; int ii = 0; + unsigned long flags; MACvReadISR(pDevice->PortOffset, &pDevice->dwIsr); @@ -2331,7 +2332,8 @@ static irqreturn_t device_intr(int irq, void *dev_instance) { handled = 1; MACvIntDisable(pDevice->PortOffset); - spin_lock_irq(&pDevice->lock); + + spin_lock_irqsave(&pDevice->lock, flags); //Make sure current page is 0 VNSvInPortB(pDevice->PortOffset + MAC_REG_PAGE1SEL, &byOrgPageSel); @@ -2560,7 +2562,8 @@ static irqreturn_t device_intr(int irq, void *dev_instance) { if (byOrgPageSel == 1) MACvSelectPage1(pDevice->PortOffset); - spin_unlock_irq(&pDevice->lock); + spin_unlock_irqrestore(&pDevice->lock, flags); + MACvIntEnable(pDevice->PortOffset, IMR_MASK_VALUE); return IRQ_RETVAL(handled); From 4aa0abed3a2a11b7d71ad560c1a3e7631c5a31cd Mon Sep 17 00:00:00 2001 From: Malcolm Priestley Date: Wed, 23 Jul 2014 21:35:12 +0100 Subject: [PATCH 389/455] staging: vt6655: Fix disassociated messages every 10 seconds byReAssocCount is incremented every second resulting in disassociated message being send every 10 seconds whether connection or not. byReAssocCount should only advance while eCommandState is in WLAN_ASSOCIATE_WAIT Change existing scope to if condition. Signed-off-by: Malcolm Priestley Cc: Signed-off-by: Greg Kroah-Hartman --- drivers/staging/vt6655/bssdb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/vt6655/bssdb.c b/drivers/staging/vt6655/bssdb.c index 59679cd46816..69b80e80b011 100644 --- a/drivers/staging/vt6655/bssdb.c +++ b/drivers/staging/vt6655/bssdb.c @@ -981,7 +981,7 @@ start: pDevice->byERPFlag &= ~(WLAN_SET_ERP_USE_PROTECTION(1)); } - { + if (pDevice->eCommandState == WLAN_ASSOCIATE_WAIT) { pDevice->byReAssocCount++; /* 10 sec timeout */ if ((pDevice->byReAssocCount > 10) && (!pDevice->bLinkPass)) { From 1b2c4869d8247f9e202fa8a73777c34adc62d409 Mon Sep 17 00:00:00 2001 From: Jerome Glisse Date: Thu, 24 Jul 2014 16:34:17 -0400 Subject: [PATCH 390/455] drm/radeon: fix cut and paste issue for hawaii. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is a halfway fix for hawaii acceleration. More fixes to come but hopefully isolated to userspace. Signed-off-by: Jérôme Glisse Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie --- drivers/gpu/drm/radeon/cik.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index cc1f02f6817d..c0ea66192fe0 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -2291,6 +2291,7 @@ static void cik_tiling_mode_table_init(struct radeon_device *rdev) gb_tile_moden = 0; break; } + rdev->config.cik.macrotile_mode_array[reg_offset] = gb_tile_moden; WREG32(GB_MACROTILE_MODE0 + (reg_offset * 4), gb_tile_moden); } } else if (num_pipe_configs == 8) { From 042d7654ecb68383c392e3c7e86e04de023f65f9 Mon Sep 17 00:00:00 2001 From: Yaniv Rosner Date: Wed, 23 Jul 2014 22:12:57 +0300 Subject: [PATCH 391/455] bnx2x: fix set_setting for some PHYs Allow set_settings() to complete succesfully even if link is not estabilished and port type isn't known yet. Signed-off-by: Yaniv Rosner Signed-off-by: Dmitry Kravkov Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c index bd0600cf7266..25eddd90f482 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c @@ -379,6 +379,7 @@ static int bnx2x_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) break; case PORT_FIBRE: case PORT_DA: + case PORT_NONE: if (!(bp->port.supported[0] & SUPPORTED_FIBRE || bp->port.supported[1] & SUPPORTED_FIBRE)) { DP(BNX2X_MSG_ETHTOOL, From a71e3c37960ce5f9c6a519bc1215e3ba9fa83e75 Mon Sep 17 00:00:00 2001 From: Ezequiel Garcia Date: Wed, 23 Jul 2014 16:47:31 -0300 Subject: [PATCH 392/455] net: phy: Set the driver when registering an MDIO bus device mdiobus_register() registers a device which is already bound to a driver. Hence, the driver pointer should be set properly in order to track down the driver associated to the MDIO bus. This will be used to allow ethernet driver to pin down a MDIO bus driver, preventing it from being unloaded while the PHY device is running. Reviewed-by: Florian Fainelli Tested-by: Florian Fainelli Signed-off-by: Ezequiel Garcia Signed-off-by: David S. Miller --- drivers/net/phy/mdio_bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 4eaadcfcb0fe..203651ebccb0 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -255,6 +255,7 @@ int mdiobus_register(struct mii_bus *bus) bus->dev.parent = bus->parent; bus->dev.class = &mdio_bus_class; + bus->dev.driver = bus->parent->driver; bus->dev.groups = NULL; dev_set_name(&bus->dev, "%s", bus->id); From b3565f278a9babc00032292be489e7fbff4f48d1 Mon Sep 17 00:00:00 2001 From: Ezequiel Garcia Date: Wed, 23 Jul 2014 16:47:32 -0300 Subject: [PATCH 393/455] net: phy: Ensure the MDIO bus module is held This commit adds proper module_{get,put} to prevent the MDIO bus module from being unloaded while the phydev is connected. By doing so, we fix a kernel panic produced when a MDIO driver is removed, but the phydev that relies on it is attached and running. Signed-off-by: Ezequiel Garcia Reviewed-by: Florian Fainelli Tested-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/phy_device.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 35d753d22f78..74a82328dd49 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -575,6 +575,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, u32 flags, phy_interface_t interface) { struct device *d = &phydev->dev; + struct module *bus_module; int err; /* Assume that if there is no driver, that it doesn't @@ -599,6 +600,14 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, return -EBUSY; } + /* Increment the bus module reference count */ + bus_module = phydev->bus->dev.driver ? + phydev->bus->dev.driver->owner : NULL; + if (!try_module_get(bus_module)) { + dev_err(&dev->dev, "failed to get the bus module\n"); + return -EIO; + } + phydev->attached_dev = dev; dev->phydev = phydev; @@ -664,6 +673,10 @@ EXPORT_SYMBOL(phy_attach); void phy_detach(struct phy_device *phydev) { int i; + + if (phydev->bus->dev.driver) + module_put(phydev->bus->dev.driver->owner); + phydev->attached_dev->phydev = NULL; phydev->attached_dev = NULL; phy_suspend(phydev); From a3cc465d95c32bfb529f69dee7841ecd67525561 Mon Sep 17 00:00:00 2001 From: hayeswang Date: Thu, 24 Jul 2014 16:37:43 +0800 Subject: [PATCH 394/455] r8152: fix the checking of the usb speed When the usb speed of the RTL8152 is not high speed, the USB_DEV_STAT[2:1] should be equal to [0 1]. That is, the STAT_SPEED_FULL should be equal to 2. There is a easy way to check the usb speed by the speed field of the struct usb_device. Use it to replace the original metheod. Signed-off-by: Hayes Wang Spotted-by: Andrey Utkin Signed-off-by: David S. Miller --- drivers/net/usb/r8152.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 7bad2d316637..3eab74c7c554 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -282,7 +282,7 @@ /* USB_DEV_STAT */ #define STAT_SPEED_MASK 0x0006 #define STAT_SPEED_HIGH 0x0000 -#define STAT_SPEED_FULL 0x0001 +#define STAT_SPEED_FULL 0x0002 /* USB_TX_AGG */ #define TX_AGG_MAX_THRESHOLD 0x03 @@ -2292,9 +2292,8 @@ static void r8152b_exit_oob(struct r8152 *tp) /* rx share fifo credit full threshold */ ocp_write_dword(tp, MCU_TYPE_PLA, PLA_RXFIFO_CTRL0, RXFIFO_THR1_NORMAL); - ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_DEV_STAT); - ocp_data &= STAT_SPEED_MASK; - if (ocp_data == STAT_SPEED_FULL) { + if (tp->udev->speed == USB_SPEED_FULL || + tp->udev->speed == USB_SPEED_LOW) { /* rx share fifo credit near full threshold */ ocp_write_dword(tp, MCU_TYPE_PLA, PLA_RXFIFO_CTRL1, RXFIFO_THR2_FULL); From fe26566d8a05151ba1dce75081f6270f73ec4ae1 Mon Sep 17 00:00:00 2001 From: Dmitry Kravkov Date: Thu, 24 Jul 2014 18:54:47 +0300 Subject: [PATCH 395/455] bnx2x: fix crash during TSO tunneling When TSO packet is transmitted additional BD w/o mapping is used to describe the packed. The BD needs special handling in tx completion. kernel: Call Trace: kernel: [] dump_stack+0x19/0x1b kernel: [] warn_slowpath_common+0x61/0x80 kernel: [] warn_slowpath_fmt+0x5c/0x80 kernel: [] ? find_iova+0x4d/0x90 kernel: [] intel_unmap_page.part.36+0x142/0x160 kernel: [] intel_unmap_page+0x26/0x30 kernel: [] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x] kernel: [] bnx2x_tx_int+0xac/0x220 [bnx2x] kernel: [] ? read_tsc+0x9/0x20 kernel: [] bnx2x_poll+0xbb/0x3c0 [bnx2x] kernel: [] net_rx_action+0x15a/0x250 kernel: [] __do_softirq+0xf7/0x290 kernel: [] call_softirq+0x1c/0x30 kernel: [] do_softirq+0x55/0x90 kernel: [] irq_exit+0x115/0x120 kernel: [] do_IRQ+0x58/0xf0 kernel: [] common_interrupt+0x6d/0x6d kernel: [] ? clockevents_notify+0x127/0x140 kernel: [] ? cpuidle_enter_state+0x4f/0xc0 kernel: [] cpuidle_idle_call+0xc5/0x200 kernel: [] arch_cpu_idle+0xe/0x30 kernel: [] cpu_startup_entry+0xf5/0x290 kernel: [] start_secondary+0x265/0x27b kernel: ---[ end trace 11aa7726f18d7e80 ]--- Fixes: a848ade408b ("bnx2x: add CSUM and TSO support for encapsulation protocols") Reported-by: Yulong Pei Cc: Michal Schmidt Signed-off-by: Dmitry Kravkov Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 9 +++++++++ 2 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index 4cab09d3f807..8206a293e6b4 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -346,6 +346,7 @@ struct sw_tx_bd { u8 flags; /* Set on the first BD descriptor when there is a split BD */ #define BNX2X_TSO_SPLIT_BD (1<<0) +#define BNX2X_HAS_SECOND_PBD (1<<1) }; struct sw_rx_page { diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 4b875da1c7ed..c43e7238de21 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -227,6 +227,12 @@ static u16 bnx2x_free_tx_pkt(struct bnx2x *bp, struct bnx2x_fp_txdata *txdata, --nbd; bd_idx = TX_BD(NEXT_TX_IDX(bd_idx)); + if (tx_buf->flags & BNX2X_HAS_SECOND_PBD) { + /* Skip second parse bd... */ + --nbd; + bd_idx = TX_BD(NEXT_TX_IDX(bd_idx)); + } + /* TSO headers+data bds share a common mapping. See bnx2x_tx_split() */ if (tx_buf->flags & BNX2X_TSO_SPLIT_BD) { tx_data_bd = &txdata->tx_desc_ring[bd_idx].reg_bd; @@ -3889,6 +3895,9 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev) /* set encapsulation flag in start BD */ SET_FLAG(tx_start_bd->general_data, ETH_TX_START_BD_TUNNEL_EXIST, 1); + + tx_buf->flags |= BNX2X_HAS_SECOND_PBD; + nbd++; } else if (xmit_type & XMIT_CSUM) { /* Set PBD in checksum offload case w/o encapsulation */ From 33cf75656923ff11d67a937a4f8e9344f58cea77 Mon Sep 17 00:00:00 2001 From: George Cherian Date: Fri, 25 Jul 2014 11:49:41 +0530 Subject: [PATCH 396/455] can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource() The raminit register is shared register for both can0 and can1. Since commit: 32766ff net: can: Convert to use devm_ioremap_resource devm_ioremap_resource() is used to map raminit register. When using both interfaces the mapping for the can1 interface fails, leading to a non functional can interface. Signed-off-by: George Cherian Signed-off-by: Mugunthan V N Cc: linux-stable # >= v3.11 Signed-off-by: Marc Kleine-Budde --- drivers/net/can/c_can/c_can_platform.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c index 824108cd9fd5..12430be6448a 100644 --- a/drivers/net/can/c_can/c_can_platform.c +++ b/drivers/net/can/c_can/c_can_platform.c @@ -287,7 +287,8 @@ static int c_can_plat_probe(struct platform_device *pdev) break; } - priv->raminit_ctrlreg = devm_ioremap_resource(&pdev->dev, res); + priv->raminit_ctrlreg = devm_ioremap(&pdev->dev, res->start, + resource_size(res)); if (IS_ERR(priv->raminit_ctrlreg) || priv->instance < 0) dev_info(&pdev->dev, "control memory is not used for raminit\n"); else From c6a26ce9af9eca685bdd766bcc1dbc855394880b Mon Sep 17 00:00:00 2001 From: Steven Miao Date: Wed, 16 Jul 2014 14:23:08 +0800 Subject: [PATCH 397/455] pm: bf609: cleanup smc nor flash drop smc pin state change code, pin state will be saved in pinctrl-adi2 driver cleanup nor flash init/exit for pm suspend/resume Signed-off-by: Steven Miao --- arch/blackfin/mach-bf609/boards/ezkit.c | 3 --- arch/blackfin/mach-bf609/include/mach/pm.h | 5 +++-- arch/blackfin/mach-bf609/pm.c | 4 ++-- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/blackfin/mach-bf609/boards/ezkit.c b/arch/blackfin/mach-bf609/boards/ezkit.c index 1ba4600de69f..6fb0765841bc 100644 --- a/arch/blackfin/mach-bf609/boards/ezkit.c +++ b/arch/blackfin/mach-bf609/boards/ezkit.c @@ -698,8 +698,6 @@ int bf609_nor_flash_init(struct platform_device *pdev) { #define CONFIG_SMC_GCTL_VAL 0x00000010 - if (!devm_pinctrl_get_select_default(&pdev->dev)) - return -EBUSY; bfin_write32(SMC_GCTL, CONFIG_SMC_GCTL_VAL); bfin_write32(SMC_B0CTL, 0x01002011); bfin_write32(SMC_B0TIM, 0x08170977); @@ -709,7 +707,6 @@ int bf609_nor_flash_init(struct platform_device *pdev) void bf609_nor_flash_exit(struct platform_device *pdev) { - devm_pinctrl_put(pdev->dev.pins->p); bfin_write32(SMC_GCTL, 0); } diff --git a/arch/blackfin/mach-bf609/include/mach/pm.h b/arch/blackfin/mach-bf609/include/mach/pm.h index 3ca0fb965636..a1efd936dd30 100644 --- a/arch/blackfin/mach-bf609/include/mach/pm.h +++ b/arch/blackfin/mach-bf609/include/mach/pm.h @@ -10,6 +10,7 @@ #define __MACH_BF609_PM_H__ #include +#include extern int bfin609_pm_enter(suspend_state_t state); extern int bf609_pm_prepare(void); @@ -19,6 +20,6 @@ void bf609_hibernate(void); void bfin_sec_raise_irq(unsigned int sid); void coreb_enable(void); -int bf609_nor_flash_init(void); -void bf609_nor_flash_exit(void); +int bf609_nor_flash_init(struct platform_device *pdev); +void bf609_nor_flash_exit(struct platform_device *pdev); #endif diff --git a/arch/blackfin/mach-bf609/pm.c b/arch/blackfin/mach-bf609/pm.c index 0cdd6955c7be..b1bfcf434d16 100644 --- a/arch/blackfin/mach-bf609/pm.c +++ b/arch/blackfin/mach-bf609/pm.c @@ -291,13 +291,13 @@ static struct bfin_cpu_pm_fns bf609_cpu_pm = { #if defined(CONFIG_MTD_PHYSMAP) || defined(CONFIG_MTD_PHYSMAP_MODULE) static int smc_pm_syscore_suspend(void) { - bf609_nor_flash_exit(); + bf609_nor_flash_exit(NULL); return 0; } static void smc_pm_syscore_resume(void) { - bf609_nor_flash_init(); + bf609_nor_flash_init(NULL); } static struct syscore_ops smc_pm_syscore_ops = { From 4ba7b5f0ce49d58e48e4c19a2c5ceea50fceda4d Mon Sep 17 00:00:00 2001 From: Steven Miao Date: Wed, 16 Jul 2014 14:37:31 +0800 Subject: [PATCH 398/455] blackfin: fix some bf5xx boards build for missing Signed-off-by: Steven Miao --- arch/blackfin/mach-bf533/boards/blackstamp.c | 1 + arch/blackfin/mach-bf537/boards/cm_bf537e.c | 1 + arch/blackfin/mach-bf537/boards/cm_bf537u.c | 1 + arch/blackfin/mach-bf537/boards/tcm_bf537.c | 1 + arch/blackfin/mach-bf561/boards/acvilon.c | 1 + arch/blackfin/mach-bf561/boards/cm_bf561.c | 1 + arch/blackfin/mach-bf561/boards/ezkit.c | 1 + 7 files changed, 7 insertions(+) diff --git a/arch/blackfin/mach-bf533/boards/blackstamp.c b/arch/blackfin/mach-bf533/boards/blackstamp.c index 63b0e4fe760c..0ccf0cf4daaf 100644 --- a/arch/blackfin/mach-bf533/boards/blackstamp.c +++ b/arch/blackfin/mach-bf533/boards/blackstamp.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/blackfin/mach-bf537/boards/cm_bf537e.c b/arch/blackfin/mach-bf537/boards/cm_bf537e.c index c65c6dbda3da..1e7290ef3525 100644 --- a/arch/blackfin/mach-bf537/boards/cm_bf537e.c +++ b/arch/blackfin/mach-bf537/boards/cm_bf537e.c @@ -21,6 +21,7 @@ #endif #include #include +#include #include #include #include diff --git a/arch/blackfin/mach-bf537/boards/cm_bf537u.c b/arch/blackfin/mach-bf537/boards/cm_bf537u.c index af58454b4bff..c7495dc74690 100644 --- a/arch/blackfin/mach-bf537/boards/cm_bf537u.c +++ b/arch/blackfin/mach-bf537/boards/cm_bf537u.c @@ -21,6 +21,7 @@ #endif #include #include +#include #include #include #include diff --git a/arch/blackfin/mach-bf537/boards/tcm_bf537.c b/arch/blackfin/mach-bf537/boards/tcm_bf537.c index a0211225748d..6b988ad653d8 100644 --- a/arch/blackfin/mach-bf537/boards/tcm_bf537.c +++ b/arch/blackfin/mach-bf537/boards/tcm_bf537.c @@ -21,6 +21,7 @@ #endif #include #include +#include #include #include #include diff --git a/arch/blackfin/mach-bf561/boards/acvilon.c b/arch/blackfin/mach-bf561/boards/acvilon.c index 430b16d5ccb1..6ab951534d79 100644 --- a/arch/blackfin/mach-bf561/boards/acvilon.c +++ b/arch/blackfin/mach-bf561/boards/acvilon.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/blackfin/mach-bf561/boards/cm_bf561.c b/arch/blackfin/mach-bf561/boards/cm_bf561.c index 9f777df4cacc..e862f7823e68 100644 --- a/arch/blackfin/mach-bf561/boards/cm_bf561.c +++ b/arch/blackfin/mach-bf561/boards/cm_bf561.c @@ -18,6 +18,7 @@ #endif #include #include +#include #include #include #include diff --git a/arch/blackfin/mach-bf561/boards/ezkit.c b/arch/blackfin/mach-bf561/boards/ezkit.c index 88dee43e7abe..2de71e8c104b 100644 --- a/arch/blackfin/mach-bf561/boards/ezkit.c +++ b/arch/blackfin/mach-bf561/boards/ezkit.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include From 3f68e175db60d8c4cbfee99a9cec44378dcb70f5 Mon Sep 17 00:00:00 2001 From: Sonic Zhang Date: Thu, 13 Feb 2014 18:52:34 +0800 Subject: [PATCH 399/455] blackfin: bind different groups of one pinmux function to different state name Signed-off-by: Sonic Zhang Signed-off-by: Steven Miao --- arch/blackfin/mach-bf548/boards/ezkit.c | 6 ++++-- arch/blackfin/mach-bf609/boards/ezkit.c | 17 ++++++++--------- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/blackfin/mach-bf548/boards/ezkit.c b/arch/blackfin/mach-bf548/boards/ezkit.c index 90138e6112c1..1fe7ff286619 100644 --- a/arch/blackfin/mach-bf548/boards/ezkit.c +++ b/arch/blackfin/mach-bf548/boards/ezkit.c @@ -2118,7 +2118,7 @@ static struct pinctrl_map __initdata bfin_pinmux_map[] = { PIN_MAP_MUX_GROUP_DEFAULT("bfin-rotary", "pinctrl-adi2.0", NULL, "rotary"), PIN_MAP_MUX_GROUP_DEFAULT("bfin_can.0", "pinctrl-adi2.0", NULL, "can0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin_can.1", "pinctrl-adi2.0", NULL, "can1"), - PIN_MAP_MUX_GROUP_DEFAULT("bf54x-lq043", "pinctrl-adi2.0", NULL, "ppi0_24b"), + PIN_MAP_MUX_GROUP_DEFAULT("bf54x-lq043", "pinctrl-adi2.0", "ppi0_24bgrp", "ppi0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin-i2s.0", "pinctrl-adi2.0", NULL, "sport0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin-tdm.0", "pinctrl-adi2.0", NULL, "sport0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin-ac97.0", "pinctrl-adi2.0", NULL, "sport0"), @@ -2140,7 +2140,9 @@ static struct pinctrl_map __initdata bfin_pinmux_map[] = { PIN_MAP_MUX_GROUP_DEFAULT("pata-bf54x", "pinctrl-adi2.0", NULL, "atapi_alter"), #endif PIN_MAP_MUX_GROUP_DEFAULT("bf5xx-nand.0", "pinctrl-adi2.0", NULL, "nfc0"), - PIN_MAP_MUX_GROUP_DEFAULT("bf54x-keys", "pinctrl-adi2.0", NULL, "keys_4x4"), + PIN_MAP_MUX_GROUP_DEFAULT("bf54x-keys", "pinctrl-adi2.0", "keys_4x4grp", "keys"), + PIN_MAP_MUX_GROUP("bf54x-keys", "4bit", "pinctrl-adi2.0", "keys_4x4grp", "keys"), + PIN_MAP_MUX_GROUP("bf54x-keys", "8bit", "pinctrl-adi2.0", "keys_8x8grp", "keys"), }; static int __init ezkit_init(void) diff --git a/arch/blackfin/mach-bf609/boards/ezkit.c b/arch/blackfin/mach-bf609/boards/ezkit.c index 6fb0765841bc..e2c0b024ce88 100644 --- a/arch/blackfin/mach-bf609/boards/ezkit.c +++ b/arch/blackfin/mach-bf609/boards/ezkit.c @@ -2055,15 +2055,14 @@ static struct pinctrl_map __initdata bfin_pinmux_map[] = { PIN_MAP_MUX_GROUP_DEFAULT("bfin-rotary", "pinctrl-adi2.0", NULL, "rotary"), PIN_MAP_MUX_GROUP_DEFAULT("bfin_can.0", "pinctrl-adi2.0", NULL, "can0"), PIN_MAP_MUX_GROUP_DEFAULT("physmap-flash.0", "pinctrl-adi2.0", NULL, "smc0"), - PIN_MAP_MUX_GROUP_DEFAULT("bf609_nl8048.2", "pinctrl-adi2.0", NULL, "ppi2_16b"), - PIN_MAP_MUX_GROUP_DEFAULT("bfin_display.0", "pinctrl-adi2.0", NULL, "ppi0_16b"), -#if IS_ENABLED(CONFIG_VIDEO_MT9M114) - PIN_MAP_MUX_GROUP_DEFAULT("bfin_capture.0", "pinctrl-adi2.0", NULL, "ppi0_8b"), -#elif IS_ENABLED(CONFIG_VIDEO_VS6624) - PIN_MAP_MUX_GROUP_DEFAULT("bfin_capture.0", "pinctrl-adi2.0", NULL, "ppi0_16b"), -#else - PIN_MAP_MUX_GROUP_DEFAULT("bfin_capture.0", "pinctrl-adi2.0", NULL, "ppi0_24b"), -#endif + PIN_MAP_MUX_GROUP_DEFAULT("bf609_nl8048.2", "pinctrl-adi2.0", "ppi2_16bgrp", "ppi2"), + PIN_MAP_MUX_GROUP("bfin_display.0", "8bit", "pinctrl-adi2.0", "ppi2_8bgrp", "ppi2"), + PIN_MAP_MUX_GROUP_DEFAULT("bfin_display.0", "pinctrl-adi2.0", "ppi2_16bgrp", "ppi2"), + PIN_MAP_MUX_GROUP("bfin_display.0", "16bit", "pinctrl-adi2.0", "ppi2_16bgrp", "ppi2"), + PIN_MAP_MUX_GROUP("bfin_capture.0", "8bit", "pinctrl-adi2.0", "ppi0_8bgrp", "ppi0"), + PIN_MAP_MUX_GROUP_DEFAULT("bfin_capture.0", "pinctrl-adi2.0", "ppi0_16bgrp", "ppi0"), + PIN_MAP_MUX_GROUP("bfin_capture.0", "16bit", "pinctrl-adi2.0", "ppi0_16bgrp", "ppi0"), + PIN_MAP_MUX_GROUP("bfin_capture.0", "24bit", "pinctrl-adi2.0", "ppi0_24bgrp", "ppi0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin-i2s.0", "pinctrl-adi2.0", NULL, "sport0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin-tdm.0", "pinctrl-adi2.0", NULL, "sport0"), PIN_MAP_MUX_GROUP_DEFAULT("bfin-i2s.1", "pinctrl-adi2.0", NULL, "sport1"), From 814ecd0d1053df8b6891c0ff02567ed66fdf574e Mon Sep 17 00:00:00 2001 From: Steven Miao Date: Thu, 24 Jul 2014 16:10:19 +0800 Subject: [PATCH 400/455] irq: blackfin sec: drop duplicated sec priority set Signed-off-by: Steven Miao --- arch/blackfin/mach-common/ints-priority.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/blackfin/mach-common/ints-priority.c b/arch/blackfin/mach-common/ints-priority.c index 867b7cef204c..1f94784eab6d 100644 --- a/arch/blackfin/mach-common/ints-priority.c +++ b/arch/blackfin/mach-common/ints-priority.c @@ -1208,8 +1208,6 @@ int __init init_arch_irq(void) bfin_sec_set_priority(CONFIG_SEC_IRQ_PRIORITY_LEVELS, sec_int_priority); - bfin_sec_set_priority(CONFIG_SEC_IRQ_PRIORITY_LEVELS, sec_int_priority); - /* Enable interrupts IVG7-15 */ bfin_irq_flags |= IMASK_IVG15 | IMASK_IVG14 | IMASK_IVG13 | IMASK_IVG12 | IMASK_IVG11 | From ac425b61135d8541cd2b41cf6fe11f9e2ca49b36 Mon Sep 17 00:00:00 2001 From: Steven Miao Date: Fri, 25 Jul 2014 10:31:16 +0800 Subject: [PATCH 401/455] defconfig: BF609: update spi config name Signed-off-by: Steven Miao --- arch/blackfin/configs/BF609-EZKIT_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/blackfin/configs/BF609-EZKIT_defconfig b/arch/blackfin/configs/BF609-EZKIT_defconfig index a7e9bfd84183..fcec5ce71392 100644 --- a/arch/blackfin/configs/BF609-EZKIT_defconfig +++ b/arch/blackfin/configs/BF609-EZKIT_defconfig @@ -102,7 +102,7 @@ CONFIG_I2C_CHARDEV=y CONFIG_I2C_BLACKFIN_TWI=y CONFIG_I2C_BLACKFIN_TWI_CLK_KHZ=100 CONFIG_SPI=y -CONFIG_SPI_BFIN_V3=y +CONFIG_SPI_ADI_V3=y CONFIG_GPIOLIB=y CONFIG_GPIO_SYSFS=y # CONFIG_HWMON is not set From b76f98236a23f808d6e3a27f7292670bc1d2c21b Mon Sep 17 00:00:00 2001 From: Steven Miao Date: Wed, 23 Jul 2014 17:28:25 +0800 Subject: [PATCH 402/455] blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data section for XIP kernel to collect some undefined section to the end of the data section and avoid section overlap Signed-off-by: Steven Miao --- arch/blackfin/kernel/vmlinux.lds.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S index ba35864b2b74..c9eec84aa258 100644 --- a/arch/blackfin/kernel/vmlinux.lds.S +++ b/arch/blackfin/kernel/vmlinux.lds.S @@ -145,7 +145,7 @@ SECTIONS .text_l1 L1_CODE_START : AT(LOADADDR(.exit.data) + SIZEOF(.exit.data)) #else - .init.data : AT(__data_lma + __data_len) + .init.data : AT(__data_lma + __data_len + 32) { __sinitdata = .; INIT_DATA From edffe1b626b39bd7121691dfdecb548431003bbb Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 25 Jul 2014 17:07:47 -0700 Subject: [PATCH 403/455] parport: fix menu breakage Do not split the PARPORT-related symbols with the new kconfig symbol ARCH_MIGHT_HAVE_PC_PARPORT. The split was causing incorrect display of these symbols -- they were not being displayed together as they should be. Fixes: d90c3eb31535 "Kconfig cleanup (PARPORT_PC dependencies)" Signed-off-by: Randy Dunlap Cc: Mark Salter Cc: Ingo Molnar Cc: stable@vger.kernel.org # for 3.13, 3.14, 3.15 Signed-off-by: Linus Torvalds --- drivers/parport/Kconfig | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/parport/Kconfig b/drivers/parport/Kconfig index 2872ece81f35..44333bd8f908 100644 --- a/drivers/parport/Kconfig +++ b/drivers/parport/Kconfig @@ -5,6 +5,12 @@ # Parport configuration. # +config ARCH_MIGHT_HAVE_PC_PARPORT + bool + help + Select this config option from the architecture Kconfig if + the architecture might have PC parallel port hardware. + menuconfig PARPORT tristate "Parallel port support" depends on HAS_IOMEM @@ -31,12 +37,6 @@ menuconfig PARPORT If unsure, say Y. -config ARCH_MIGHT_HAVE_PC_PARPORT - bool - help - Select this config option from the architecture Kconfig if - the architecture might have PC parallel port hardware. - if PARPORT config PARPORT_PC From 28c9770bcbd2b6dbab99669825a2f8fa69e6d35b Mon Sep 17 00:00:00 2001 From: Haojian Zhuang Date: Wed, 2 Apr 2014 21:31:50 +0800 Subject: [PATCH 404/455] ARM: dts: fix L2 address in Hi3620 Fix the address of L2 controler register in hi3620 SoC. This has been wrong from the point that the file was merged in v3.14. Signed-off-by: Haojian Zhuang Acked-by: Wei Xu Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann --- arch/arm/boot/dts/hi3620.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/hi3620.dtsi b/arch/arm/boot/dts/hi3620.dtsi index ab1116d086be..83a5b8685bd9 100644 --- a/arch/arm/boot/dts/hi3620.dtsi +++ b/arch/arm/boot/dts/hi3620.dtsi @@ -73,7 +73,7 @@ L2: l2-cache { compatible = "arm,pl310-cache"; - reg = <0xfc10000 0x100000>; + reg = <0x100000 0x100000>; interrupts = <0 15 4>; cache-unified; cache-level = <2>; From 8bdd638091605dc66d92c57c4b80eb87fffc15f7 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Sat, 26 Jul 2014 12:58:23 -0700 Subject: [PATCH 405/455] mm: fix direct reclaim writeback regression Shortly before 3.16-rc1, Dave Jones reported: WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971 xfs_vm_writepage+0x5ce/0x630 [xfs]() CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3 Call Trace: xfs_vm_writepage+0x5ce/0x630 [xfs] shrink_page_list+0x8f9/0xb90 shrink_inactive_list+0x253/0x510 shrink_lruvec+0x563/0x6c0 shrink_zone+0x3b/0x100 shrink_zones+0x1f1/0x3c0 try_to_free_pages+0x164/0x380 __alloc_pages_nodemask+0x822/0xc90 alloc_pages_vma+0xaf/0x1c0 handle_mm_fault+0xa31/0xc50 etc. 970 if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == 971 PF_MEMALLOC)) I did not respond at the time, because a glance at the PageDirty block in shrink_page_list() quickly shows that this is impossible: we don't do writeback on file pages (other than tmpfs) from direct reclaim nowadays. Dave was hallucinating, but it would have been disrespectful to say so. However, my own /var/log/messages now shows similar complaints WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b() WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b() from stressing some mmotm trees during July. Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked, so fail shrink_page_list()'s page_is_file_cache() test, and so proceed to mapping->a_ops->writepage()? Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination page freeing callback") has provided such a way to compaction: if migrating a SwapBacked page fails, its newpage may be put back on the list for later use with PageSwapBacked still set, and nothing will clear it. Whether that can do anything worse than issue WARN_ON_ONCEs, and get some statistics wrong, is unclear: easier to fix than to think through the consequences. Fixing it here, before the put_new_page(), addresses the bug directly, but is probably the worst place to fix it. Page migration is doing too many parts of the job on too many levels: fixing it in move_to_new_page() to complement its SetPageSwapBacked would be preferable, except why is it (and newpage->mapping and newpage->index) done there, rather than down in migrate_page_move_mapping(), once we are sure of success? Not a cleanup to get into right now, especially not with memcg cleanups coming in 3.17. Reported-by: Dave Jones Signed-off-by: Hugh Dickins Signed-off-by: Linus Torvalds --- mm/migrate.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 9e0beaa91845..be6dbf995c0c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -988,9 +988,10 @@ out: * it. Otherwise, putback_lru_page() will drop the reference grabbed * during isolation. */ - if (rc != MIGRATEPAGE_SUCCESS && put_new_page) + if (rc != MIGRATEPAGE_SUCCESS && put_new_page) { + ClearPageSwapBacked(newpage); put_new_page(newpage, private); - else + } else putback_lru_page(newpage); if (result) { From 2062afb4f804afef61cbe62a30cac9a46e58e067 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 26 Jul 2014 14:52:01 -0700 Subject: [PATCH 406/455] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Michel Dänzer and a couple of other people reported inexplicable random oopses in the scheduler, and the cause turns out to be gcc mis-compiling the load_balance() function when debugging is enabled. The gcc bug apparently goes back to gcc-4.5, but slight optimization changes means that it now showed up as a problem in 4.9.0 and 4.9.1. The instruction scheduling problem causes gcc to schedule a spill operation to before the stack frame has been created, which in turn can corrupt the spilled value if an interrupt comes in. There may be other effects of this bug too, but that's the code generation problem seen in Michel's case. This is fixed in current gcc HEAD, but the workaround as suggested by Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments when compiling the kernel, which disables the gcc code that causes the problem. This can result in slightly worse debug information for variable accesses, but that is infinitely preferable to actual code generation problems. Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows non-debug builds to verify that the debug build would be identical: we can do export GCC_COMPARE_DEBUG=1 to make gcc internally verify that the result of the build is independent of the "-g" flag (it will make the compiler build everything twice, toggling the debug flag, and compare the results). Without the "-fno-var-tracking-assignments" option, the build would fail (even with 4.8.3 that didn't show the actual stack frame bug) with a gcc compare failure. See also gcc bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801 Reported-by: Michel Dänzer Suggested-by: Markus Trippelsdorf Cc: Jakub Jelinek Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Makefile b/Makefile index 6b2774145d66..5147f3f23504 100644 --- a/Makefile +++ b/Makefile @@ -688,6 +688,8 @@ KBUILD_CFLAGS += -fomit-frame-pointer endif endif +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) + ifdef CONFIG_DEBUG_INFO KBUILD_CFLAGS += -g KBUILD_AFLAGS += -Wa,-gdwarf-2 From 64aa90f26c06e1cb2aacfb98a7d0eccfbd6c1a91 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 27 Jul 2014 12:41:55 -0700 Subject: [PATCH 407/455] Linux 3.16-rc7 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 5147f3f23504..f6a7794e4db4 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 16 SUBLEVEL = 0 -EXTRAVERSION = -rc6 +EXTRAVERSION = -rc7 NAME = Shuffling Zombie Juror # *DOCUMENTATION* From fa952c54ba13deeb91cd4c7af255cdb5f1273535 Mon Sep 17 00:00:00 2001 From: Vasant Hegde Date: Wed, 23 Jul 2014 14:52:39 +0530 Subject: [PATCH 408/455] powerpc/powernv: Change BUG_ON to WARN_ON in elog code We can continue to read the error log (up to MAX size) even if we get the elog size more than MAX size. Hence change BUG_ON to WARN_ON. Also updated error message. Reported-by: Gopesh Kumar Chaudhary Signed-off-by: Vasant Hegde Signed-off-by: Ananth N Mavinakayanahalli Acked-by: Deepthi Dharwar Acked-by: Stewart Smith Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/platforms/powernv/opal-elog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c index 10268c41d830..0ad533b617f7 100644 --- a/arch/powerpc/platforms/powernv/opal-elog.c +++ b/arch/powerpc/platforms/powernv/opal-elog.c @@ -249,7 +249,7 @@ static void elog_work_fn(struct work_struct *work) rc = opal_get_elog_size(&id, &size, &type); if (rc != OPAL_SUCCESS) { - pr_err("ELOG: Opal log read failed\n"); + pr_err("ELOG: OPAL log info read failed\n"); return; } @@ -257,7 +257,7 @@ static void elog_work_fn(struct work_struct *work) log_id = be64_to_cpu(id); elog_type = be64_to_cpu(type); - BUG_ON(elog_size > OPAL_MAX_ERRLOG_SIZE); + WARN_ON(elog_size > OPAL_MAX_ERRLOG_SIZE); if (elog_size >= OPAL_MAX_ERRLOG_SIZE) elog_size = OPAL_MAX_ERRLOG_SIZE; From 396a34340cdf7373c00e3977db27d1a20ea65ebc Mon Sep 17 00:00:00 2001 From: Thomas Falcon Date: Fri, 25 Jul 2014 12:47:42 -0500 Subject: [PATCH 409/455] powerpc: Fix endianness of flash_block_list in rtas_flash The function rtas_flash_firmware passes the address of a data structure, flash_block_list, when making the update-flash-64-and-reboot rtas call. While the endianness of the address is handled correctly, the endianness of the data is not. This patch ensures that the data in flash_block_list is big endian when passed to rtas on little endian hosts. Signed-off-by: Thomas Falcon Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/kernel/rtas_flash.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c index 658e89d2025b..db2b482af658 100644 --- a/arch/powerpc/kernel/rtas_flash.c +++ b/arch/powerpc/kernel/rtas_flash.c @@ -611,17 +611,19 @@ static void rtas_flash_firmware(int reboot_type) for (f = flist; f; f = next) { /* Translate data addrs to absolute */ for (i = 0; i < f->num_blocks; i++) { - f->blocks[i].data = (char *)__pa(f->blocks[i].data); + f->blocks[i].data = (char *)cpu_to_be64(__pa(f->blocks[i].data)); image_size += f->blocks[i].length; + f->blocks[i].length = cpu_to_be64(f->blocks[i].length); } next = f->next; /* Don't translate NULL pointer for last entry */ if (f->next) - f->next = (struct flash_block_list *)__pa(f->next); + f->next = (struct flash_block_list *)cpu_to_be64(__pa(f->next)); else f->next = NULL; /* make num_blocks into the version/length field */ f->num_blocks = (FLASH_BLOCK_LIST_VERSION << 56) | ((f->num_blocks+1)*16); + f->num_blocks = cpu_to_be64(f->num_blocks); } printk(KERN_ALERT "FLASH: flash image is %ld bytes\n", image_size); From f960d2093f29f0bc4e1df1fcefb993455620c0b5 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Fri, 25 Jul 2014 19:40:20 -0400 Subject: [PATCH 410/455] crypto: arm64-aes - fix encryption of unaligned data cryptsetup fails on arm64 when using kernel encryption via AF_ALG socket. See https://bugzilla.redhat.com/show_bug.cgi?id=1122937 The bug is caused by incorrect handling of unaligned data in arch/arm64/crypto/aes-glue.c. Cryptsetup creates a buffer that is aligned on 8 bytes, but not on 16 bytes. It opens AF_ALG socket and uses the socket to encrypt data in the buffer. The arm64 crypto accelerator causes data corruption or crashes in the scatterwalk_pagedone. This patch fixes the bug by passing the residue bytes that were not processed as the last parameter to blkcipher_walk_done. Signed-off-by: Mikulas Patocka Acked-by: Ard Biesheuvel Signed-off-by: Herbert Xu --- arch/arm64/crypto/aes-glue.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 60f2f4c12256..79cd911ef88c 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -106,7 +106,7 @@ static int ecb_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key_enc, rounds, blocks, first); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } kernel_neon_end(); return err; @@ -128,7 +128,7 @@ static int ecb_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst, for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key_dec, rounds, blocks, first); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } kernel_neon_end(); return err; @@ -151,7 +151,7 @@ static int cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key_enc, rounds, blocks, walk.iv, first); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } kernel_neon_end(); return err; @@ -174,7 +174,7 @@ static int cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst, aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key_dec, rounds, blocks, walk.iv, first); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } kernel_neon_end(); return err; @@ -243,7 +243,7 @@ static int xts_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst, aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_enc, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } kernel_neon_end(); @@ -267,7 +267,7 @@ static int xts_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst, aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_dec, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } kernel_neon_end(); From f3c400ef473e00c680ea713a66196b05870b3710 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Fri, 25 Jul 2014 19:42:30 -0400 Subject: [PATCH 411/455] crypto: arm-aes - fix encryption of unaligned data Fix the same alignment bug as in arm64 - we need to pass residue unprocessed bytes as the last argument to blkcipher_walk_done. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org # 3.13+ Acked-by: Ard Biesheuvel Signed-off-by: Herbert Xu --- arch/arm/crypto/aesbs-glue.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm/crypto/aesbs-glue.c b/arch/arm/crypto/aesbs-glue.c index 4522366da759..15468fbbdea3 100644 --- a/arch/arm/crypto/aesbs-glue.c +++ b/arch/arm/crypto/aesbs-glue.c @@ -137,7 +137,7 @@ static int aesbs_cbc_encrypt(struct blkcipher_desc *desc, dst += AES_BLOCK_SIZE; } while (--blocks); } - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } return err; } @@ -158,7 +158,7 @@ static int aesbs_cbc_decrypt(struct blkcipher_desc *desc, bsaes_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->dec, walk.iv); kernel_neon_end(); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } while (walk.nbytes) { u32 blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -182,7 +182,7 @@ static int aesbs_cbc_decrypt(struct blkcipher_desc *desc, dst += AES_BLOCK_SIZE; src += AES_BLOCK_SIZE; } while (--blocks); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } return err; } @@ -268,7 +268,7 @@ static int aesbs_xts_encrypt(struct blkcipher_desc *desc, bsaes_xts_encrypt(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->enc, walk.iv); kernel_neon_end(); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } return err; } @@ -292,7 +292,7 @@ static int aesbs_xts_decrypt(struct blkcipher_desc *desc, bsaes_xts_decrypt(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->dec, walk.iv); kernel_neon_end(); - err = blkcipher_walk_done(desc, &walk, 0); + err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE); } return err; } From 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Wed, 23 Jul 2014 08:34:11 -0700 Subject: [PATCH 412/455] x86_64/entry/xen: Do not invoke espfix64 on Xen This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: Andy Lutomirski Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net Signed-off-by: H. Peter Anvin Cc: --- arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/entry_64.S | 28 ++++++++++------------------ arch/x86/kernel/paravirt_patch_64.c | 2 -- 3 files changed, 11 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index bba3cf88e624..0a8b519226b8 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -129,7 +129,7 @@ static inline notrace unsigned long arch_local_irq_save(void) #define PARAVIRT_ADJUST_EXCEPTION_FRAME /* */ -#define INTERRUPT_RETURN iretq +#define INTERRUPT_RETURN jmp native_iret #define USERGS_SYSRET64 \ swapgs; \ sysretq; diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index b25ca969edd2..c844f0816ab8 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -830,27 +830,24 @@ restore_args: RESTORE_ARGS 1,8,1 irq_return: + INTERRUPT_RETURN + +ENTRY(native_iret) /* * Are we returning to a stack segment from the LDT? Note: in * 64-bit mode SS:RSP on the exception stack is always valid. */ #ifdef CONFIG_X86_ESPFIX64 testb $4,(SS-RIP)(%rsp) - jnz irq_return_ldt + jnz native_irq_return_ldt #endif -irq_return_iret: - INTERRUPT_RETURN - _ASM_EXTABLE(irq_return_iret, bad_iret) - -#ifdef CONFIG_PARAVIRT -ENTRY(native_iret) +native_irq_return_iret: iretq - _ASM_EXTABLE(native_iret, bad_iret) -#endif + _ASM_EXTABLE(native_irq_return_iret, bad_iret) #ifdef CONFIG_X86_ESPFIX64 -irq_return_ldt: +native_irq_return_ldt: pushq_cfi %rax pushq_cfi %rdi SWAPGS @@ -872,7 +869,7 @@ irq_return_ldt: SWAPGS movq %rax,%rsp popq_cfi %rax - jmp irq_return_iret + jmp native_irq_return_iret #endif .section .fixup,"ax" @@ -956,13 +953,8 @@ __do_double_fault: cmpl $__KERNEL_CS,CS(%rdi) jne do_double_fault movq RIP(%rdi),%rax - cmpq $irq_return_iret,%rax -#ifdef CONFIG_PARAVIRT - je 1f - cmpq $native_iret,%rax -#endif + cmpq $native_irq_return_iret,%rax jne do_double_fault /* This shouldn't happen... */ -1: movq PER_CPU_VAR(kernel_stack),%rax subq $(6*8-KERNEL_STACK_OFFSET),%rax /* Reset to original stack */ movq %rax,RSP(%rdi) @@ -1428,7 +1420,7 @@ error_sti: */ error_kernelspace: incl %ebx - leaq irq_return_iret(%rip),%rcx + leaq native_irq_return_iret(%rip),%rcx cmpq %rcx,RIP+8(%rsp) je error_swapgs movl %ecx,%eax /* zero extend */ diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c index 3f08f34f93eb..a1da6737ba5b 100644 --- a/arch/x86/kernel/paravirt_patch_64.c +++ b/arch/x86/kernel/paravirt_patch_64.c @@ -6,7 +6,6 @@ DEF_NATIVE(pv_irq_ops, irq_disable, "cli"); DEF_NATIVE(pv_irq_ops, irq_enable, "sti"); DEF_NATIVE(pv_irq_ops, restore_fl, "pushq %rdi; popfq"); DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax"); -DEF_NATIVE(pv_cpu_ops, iret, "iretq"); DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax"); DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax"); DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3"); @@ -50,7 +49,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf, PATCH_SITE(pv_irq_ops, save_fl); PATCH_SITE(pv_irq_ops, irq_enable); PATCH_SITE(pv_irq_ops, irq_disable); - PATCH_SITE(pv_cpu_ops, iret); PATCH_SITE(pv_cpu_ops, irq_enable_sysexit); PATCH_SITE(pv_cpu_ops, usergs_sysret32); PATCH_SITE(pv_cpu_ops, usergs_sysret64); From 8266f5fcf015101fbeb73cbc152c9d208c2baec0 Mon Sep 17 00:00:00 2001 From: David L Stevens Date: Fri, 25 Jul 2014 10:30:11 -0400 Subject: [PATCH 413/455] sunvnet: only use connected ports when sending The sunvnet driver doesn't check whether or not a port is connected when transmitting packets, which results in failures if a port fails to connect (e.g., due to a version mismatch). The original code also assumes unnecessarily that the first port is up and a switch, even though there is a flag for switch ports. This patch only matches a port if it is connected, and otherwise uses the switch_port flag to send the packet to a switch port that is up. Signed-off-by: David L Stevens Signed-off-by: David S. Miller --- drivers/net/ethernet/sun/sunvnet.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c index fd411d6e19a2..d813bfb1a847 100644 --- a/drivers/net/ethernet/sun/sunvnet.c +++ b/drivers/net/ethernet/sun/sunvnet.c @@ -610,6 +610,13 @@ static int __vnet_tx_trigger(struct vnet_port *port) return err; } +static inline bool port_is_up(struct vnet_port *vnet) +{ + struct vio_driver_state *vio = &vnet->vio; + + return !!(vio->hs_state & VIO_HS_COMPLETE); +} + struct vnet_port *__tx_port_find(struct vnet *vp, struct sk_buff *skb) { unsigned int hash = vnet_hashfn(skb->data); @@ -617,14 +624,19 @@ struct vnet_port *__tx_port_find(struct vnet *vp, struct sk_buff *skb) struct vnet_port *port; hlist_for_each_entry(port, hp, hash) { + if (!port_is_up(port)) + continue; if (ether_addr_equal(port->raddr, skb->data)) return port; } - port = NULL; - if (!list_empty(&vp->port_list)) - port = list_entry(vp->port_list.next, struct vnet_port, list); - - return port; + list_for_each_entry(port, &vp->port_list, list) { + if (!port->switch_port) + continue; + if (!port_is_up(port)) + continue; + return port; + } + return NULL; } struct vnet_port *tx_port_find(struct vnet *vp, struct sk_buff *skb) From 545469f7a5d7f7b2a17b74da0a1bd0c1aea2f545 Mon Sep 17 00:00:00 2001 From: Jun Zhao Date: Sat, 26 Jul 2014 00:38:59 +0800 Subject: [PATCH 414/455] neighbour : fix ndm_type type error issue ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST. NDA_DST is for netlink TLV type, hence it's not right value in this context. Signed-off-by: Jun Zhao Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- drivers/net/vxlan.c | 2 +- net/core/neighbour.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index ade33ef82823..9f79192c9aa0 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -339,7 +339,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, ndm->ndm_state = fdb->state; ndm->ndm_ifindex = vxlan->dev->ifindex; ndm->ndm_flags = fdb->flags; - ndm->ndm_type = NDA_DST; + ndm->ndm_type = RTN_UNICAST; if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr)) goto nla_put_failure; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 559890b0f0a2..ef31fef25e5a 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2249,7 +2249,7 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn, ndm->ndm_pad1 = 0; ndm->ndm_pad2 = 0; ndm->ndm_flags = pn->flags | NTF_PROXY; - ndm->ndm_type = NDA_DST; + ndm->ndm_type = RTN_UNICAST; ndm->ndm_ifindex = pn->dev->ifindex; ndm->ndm_state = NUD_NONE; From 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 26 Jul 2014 08:58:10 +0200 Subject: [PATCH 415/455] ip: make IP identifiers less predictable In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet Reported-by: Jeffrey Knockel Reported-by: Jedidiah R. Crandall Cc: Willy Tarreau Cc: Hannes Frederic Sowa Signed-off-by: David S. Miller --- include/net/ip.h | 11 +---------- net/ipv4/route.c | 32 +++++++++++++++++++++++++++++--- net/ipv6/ip6_output.c | 2 ++ 3 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 0e795df05ec9..7596eb22e1ce 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -309,16 +309,7 @@ static inline unsigned int ip_skb_dst_mtu(const struct sk_buff *skb) } } -#define IP_IDENTS_SZ 2048u -extern atomic_t *ip_idents; - -static inline u32 ip_idents_reserve(u32 hash, int segs) -{ - atomic_t *id_ptr = ip_idents + hash % IP_IDENTS_SZ; - - return atomic_add_return(segs, id_ptr) - segs; -} - +u32 ip_idents_reserve(u32 hash, int segs); void __ip_select_ident(struct iphdr *iph, int segs); static inline void ip_select_ident_segs(struct sk_buff *skb, struct sock *sk, int segs) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 3162ea923ded..190199851c9a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -457,8 +457,31 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, return neigh_create(&arp_tbl, pkey, dev); } -atomic_t *ip_idents __read_mostly; -EXPORT_SYMBOL(ip_idents); +#define IP_IDENTS_SZ 2048u +struct ip_ident_bucket { + atomic_t id; + u32 stamp32; +}; + +static struct ip_ident_bucket *ip_idents __read_mostly; + +/* In order to protect privacy, we add a perturbation to identifiers + * if one generator is seldom used. This makes hard for an attacker + * to infer how many packets were sent between two points in time. + */ +u32 ip_idents_reserve(u32 hash, int segs) +{ + struct ip_ident_bucket *bucket = ip_idents + hash % IP_IDENTS_SZ; + u32 old = ACCESS_ONCE(bucket->stamp32); + u32 now = (u32)jiffies; + u32 delta = 0; + + if (old != now && cmpxchg(&bucket->stamp32, old, now) == old) + delta = prandom_u32_max(now - old); + + return atomic_add_return(segs + delta, &bucket->id) - segs; +} +EXPORT_SYMBOL(ip_idents_reserve); void __ip_select_ident(struct iphdr *iph, int segs) { @@ -467,7 +490,10 @@ void __ip_select_ident(struct iphdr *iph, int segs) net_get_random_once(&ip_idents_hashrnd, sizeof(ip_idents_hashrnd)); - hash = jhash_1word((__force u32)iph->daddr, ip_idents_hashrnd); + hash = jhash_3words((__force u32)iph->daddr, + (__force u32)iph->saddr, + iph->protocol, + ip_idents_hashrnd); id = ip_idents_reserve(hash, segs); iph->id = htons(id); } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index cb9df0eb4023..45702b8cd141 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -545,6 +545,8 @@ static void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt) net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd)); hash = __ipv6_addr_jhash(&rt->rt6i_dst.addr, ip6_idents_hashrnd); + hash = __ipv6_addr_jhash(&rt->rt6i_src.addr, hash); + id = ip_idents_reserve(hash, 1); fhdr->identification = htonl(id); } From d937678ab625b0f63ce54b2b9b3e1cbd1b4a1b15 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 25 Jul 2014 04:41:25 -0700 Subject: [PATCH 416/455] ARM: dts: Revert enabling of twl configuration for n900 Commit 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration for selected omaps) allowed n900 to cut off core voltages during off-idle. This however caused a regression where twl regulator vaux1 was not getting enabled for the LCD panel as we are not requesting it for the panel. Turns out quite a few devices on n900 are using vaux1, and we need to either stop idling it, or add proper regulator_get calls for all users. But until we have a proper solution implemented and tested, let's just disable the twl off-idle configuration for now for n900. Reported-by: Aaro Koskinen Fixes: 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration for selected omaps) Signed-off-by: Tony Lindgren --- arch/arm/boot/dts/omap3-n900.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/omap3-n900.dts b/arch/arm/boot/dts/omap3-n900.dts index 1fe45d1f75ec..b15f1a77d684 100644 --- a/arch/arm/boot/dts/omap3-n900.dts +++ b/arch/arm/boot/dts/omap3-n900.dts @@ -353,7 +353,7 @@ }; twl_power: power { - compatible = "ti,twl4030-power-n900", "ti,twl4030-power-idle-osc-off"; + compatible = "ti,twl4030-power-n900"; ti,use_poweroff; }; }; From 823a19cd3b91b0729d7417f1848413846be61712 Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 29 Jul 2014 09:24:47 +0100 Subject: [PATCH 417/455] ARM: fix alignment of keystone page table fixup If init_mm.brk is not section aligned, the LPAE fixup code will miss updating the final PMD. Fix this by aligning map_end. Fixes: a77e0c7b2774 ("ARM: mm: Recreate kernel mappings in early_paging_init()") Cc: Signed-off-by: Russell King --- arch/arm/mm/mmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index ab14b79b03f0..6e3ba8d112a2 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1406,8 +1406,8 @@ void __init early_paging_init(const struct machine_desc *mdesc, return; /* remap kernel code and data */ - map_start = init_mm.start_code; - map_end = init_mm.brk; + map_start = init_mm.start_code & PMD_MASK; + map_end = ALIGN(init_mm.brk, PMD_SIZE); /* get a handle on things... */ pgd0 = pgd_offset_k(0); @@ -1442,7 +1442,7 @@ void __init early_paging_init(const struct machine_desc *mdesc, } /* remap pmds for kernel mapping */ - phys = __pa(map_start) & PMD_MASK; + phys = __pa(map_start); do { *pmdk++ = __pmd(phys | pmdprot); phys += PMD_SIZE; From 811a2407a3cf7bbd027fbe92d73416f17485a3d8 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Fri, 25 Jul 2014 09:17:12 +0100 Subject: [PATCH 418/455] ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2 (pmd) entries map 2MiB. When the identity mapping is created on LPAE, the pgd pointers are copied from the swapper_pg_dir. If we find that we need to modify the contents of a pmd, we allocate a new empty pmd table and insert it into the appropriate 1GB slot, before then filling it with the identity mapping. However, if the 1GB slot covers the kernel lowmem mappings, we obliterate those mappings. When replacing a PMD, first copy the old PMD contents to the new PMD, so that we preserve the existing mappings, particularly the mappings of the kernel itself. [rewrote commit message and added code comment -- rmk] Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format") Signed-off-by: Konstantin Khlebnikov Signed-off-by: Russell King --- arch/arm/mm/idmap.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c index 8e0e52eb76b5..d7a0ee898d24 100644 --- a/arch/arm/mm/idmap.c +++ b/arch/arm/mm/idmap.c @@ -25,6 +25,13 @@ static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end, pr_warning("Failed to allocate identity pmd.\n"); return; } + /* + * Copy the original PMD to ensure that the PMD entries for + * the kernel image are preserved. + */ + if (!pud_none(*pud)) + memcpy(pmd, pmd_offset(pud, 0), + PTRS_PER_PMD * sizeof(pmd_t)); pud_populate(&init_mm, pud, pmd); pmd += pmd_index(addr); } else From 1aab4d772ece470135bee47529d3f7c6d68609fa Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 27 Jul 2014 14:15:33 -0700 Subject: [PATCH 419/455] mm: fix page_alloc.c kernel-doc warnings Fix kernel-doc warnings and function name in mm/page_alloc.c: Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn' Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask' Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask' Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn' Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask' Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask' Signed-off-by: Randy Dunlap Acked-by: Mel Gorman Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0ea758b898fd..8bcfe3ae20cb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6062,11 +6062,13 @@ static inline int pfn_to_bitidx(struct zone *zone, unsigned long pfn) } /** - * get_pageblock_flags_group - Return the requested group of flags for the pageblock_nr_pages block of pages + * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages * @page: The page within the block of interest - * @start_bitidx: The first bit of interest to retrieve - * @end_bitidx: The last bit of interest - * returns pageblock_bits flags + * @pfn: The target page frame number + * @end_bitidx: The last bit of interest to retrieve + * @mask: mask of bits that the caller is interested in + * + * Return: pageblock_bits flags */ unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, unsigned long end_bitidx, @@ -6091,9 +6093,10 @@ unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, /** * set_pfnblock_flags_mask - Set the requested group of flags for a pageblock_nr_pages block of pages * @page: The page within the block of interest - * @start_bitidx: The first bit of interest - * @end_bitidx: The last bit of interest * @flags: The flags to set + * @pfn: The target page frame number + * @end_bitidx: The last bit of interest + * @mask: mask of bits that the caller is interested in */ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, unsigned long pfn, From 0ef135152353d323daa1fef2db94a23e9ae31fb6 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 29 Jul 2014 17:53:23 +0100 Subject: [PATCH 420/455] AFS: Correctly assemble the client UUID Correctly assemble the client UUID by OR'ing in the flags rather than assigning them over the other components. Reported-by: Himangi Saraogi Signed-off-by: David Howells Signed-off-by: Linus Torvalds --- fs/afs/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/afs/main.c b/fs/afs/main.c index 42dd2e499ed8..35de0c04729f 100644 --- a/fs/afs/main.c +++ b/fs/afs/main.c @@ -55,13 +55,13 @@ static int __init afs_get_client_UUID(void) afs_uuid.time_low = uuidtime; afs_uuid.time_mid = uuidtime >> 32; afs_uuid.time_hi_and_version = (uuidtime >> 48) & AFS_UUID_TIMEHI_MASK; - afs_uuid.time_hi_and_version = AFS_UUID_VERSION_TIME; + afs_uuid.time_hi_and_version |= AFS_UUID_VERSION_TIME; get_random_bytes(&clockseq, 2); afs_uuid.clock_seq_low = clockseq; afs_uuid.clock_seq_hi_and_reserved = (clockseq >> 8) & AFS_UUID_CLOCKHI_MASK; - afs_uuid.clock_seq_hi_and_reserved = AFS_UUID_VARIANT_STD; + afs_uuid.clock_seq_hi_and_reserved |= AFS_UUID_VARIANT_STD; _debug("AFS UUID: %08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x", afs_uuid.time_low, From 86b7987dd7a8acbaa54a446a73e2431da88b3ca1 Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Sat, 26 Jul 2014 02:34:31 +0400 Subject: [PATCH 421/455] isdn/bas_gigaset: fix a leak on failure path in gigaset_probe() There is a lack of usb_put_dev(udev) on failure path in gigaset_probe(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Acked-by: Tilman Schmidt Signed-off-by: David S. Miller --- drivers/isdn/gigaset/bas-gigaset.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/isdn/gigaset/bas-gigaset.c b/drivers/isdn/gigaset/bas-gigaset.c index c44950d3eb7b..b7ae0a0dd5b6 100644 --- a/drivers/isdn/gigaset/bas-gigaset.c +++ b/drivers/isdn/gigaset/bas-gigaset.c @@ -2400,6 +2400,7 @@ allocerr: error: freeurbs(cs); usb_set_intfdata(interface, NULL); + usb_put_dev(udev); gigaset_freecs(cs); return rc; } From 40eea803c6b2cfaab092f053248cbeab3f368412 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Sat, 26 Jul 2014 21:26:58 +0400 Subject: [PATCH 422/455] net: sendmsg: fix NULL pointer dereference Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa Cc: Eric Dumazet Cc: Reported-by: Sasha Levin Signed-off-by: Andrey Ryabinin Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller --- net/compat.c | 9 +++++---- net/core/iovec.c | 6 +++--- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/net/compat.c b/net/compat.c index 9a76eaf63184..bc8aeefddf3f 100644 --- a/net/compat.c +++ b/net/compat.c @@ -85,7 +85,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov, { int tot_len; - if (kern_msg->msg_namelen) { + if (kern_msg->msg_name && kern_msg->msg_namelen) { if (mode == VERIFY_READ) { int err = move_addr_to_kernel(kern_msg->msg_name, kern_msg->msg_namelen, @@ -93,10 +93,11 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov, if (err < 0) return err; } - if (kern_msg->msg_name) - kern_msg->msg_name = kern_address; - } else + kern_msg->msg_name = kern_address; + } else { kern_msg->msg_name = NULL; + kern_msg->msg_namelen = 0; + } tot_len = iov_from_user_compat_to_kern(kern_iov, (struct compat_iovec __user *)kern_msg->msg_iov, diff --git a/net/core/iovec.c b/net/core/iovec.c index 827dd6beb49c..e1ec45ab1e63 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c @@ -39,7 +39,7 @@ int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr_storage *a { int size, ct, err; - if (m->msg_namelen) { + if (m->msg_name && m->msg_namelen) { if (mode == VERIFY_READ) { void __user *namep; namep = (void __user __force *) m->msg_name; @@ -48,10 +48,10 @@ int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr_storage *a if (err < 0) return err; } - if (m->msg_name) - m->msg_name = address; + m->msg_name = address; } else { m->msg_name = NULL; + m->msg_namelen = 0; } size = m->msg_iovlen * sizeof(struct iovec); From 20fbe3ae990fd54fc7d1f889d61958bc8b38f254 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Mon, 28 Jul 2014 10:12:34 +0200 Subject: [PATCH 423/455] cdc_subset: deal with a device that needs reset for timeout This device needs to be reset to recover from a timeout. Unfortunately this can be handled only at the level of the subdrivers. Signed-off-by: Oliver Neukum Signed-off-by: David S. Miller --- drivers/net/usb/cdc_subset.c | 27 +++++++++++++++++++++++++++ drivers/net/usb/usbnet.c | 8 ++++++-- include/linux/usb/usbnet.h | 3 +++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/net/usb/cdc_subset.c b/drivers/net/usb/cdc_subset.c index 91f0919fe278..3ef411efd86e 100644 --- a/drivers/net/usb/cdc_subset.c +++ b/drivers/net/usb/cdc_subset.c @@ -85,9 +85,34 @@ static int always_connected (struct usbnet *dev) * *-------------------------------------------------------------------------*/ +static void m5632_recover(struct usbnet *dev) +{ + struct usb_device *udev = dev->udev; + struct usb_interface *intf = dev->intf; + int r; + + r = usb_lock_device_for_reset(udev, intf); + if (r < 0) + return; + + usb_reset_device(udev); + usb_unlock_device(udev); +} + +static int dummy_prereset(struct usb_interface *intf) +{ + return 0; +} + +static int dummy_postreset(struct usb_interface *intf) +{ + return 0; +} + static const struct driver_info ali_m5632_info = { .description = "ALi M5632", .flags = FLAG_POINTTOPOINT, + .recover = m5632_recover, }; #endif @@ -332,6 +357,8 @@ static struct usb_driver cdc_subset_driver = { .probe = usbnet_probe, .suspend = usbnet_suspend, .resume = usbnet_resume, + .pre_reset = dummy_prereset, + .post_reset = dummy_postreset, .disconnect = usbnet_disconnect, .id_table = products, .disable_hub_initiated_lpm = 1, diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index f9e96c427558..5173821a9575 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1218,8 +1218,12 @@ void usbnet_tx_timeout (struct net_device *net) unlink_urbs (dev, &dev->txq); tasklet_schedule (&dev->bh); - - // FIXME: device recovery -- reset? + /* this needs to be handled individually because the generic layer + * doesn't know what is sufficient and could not restore private + * information if a remedy of an unconditional reset were used. + */ + if (dev->driver_info->recover) + (dev->driver_info->recover)(dev); } EXPORT_SYMBOL_GPL(usbnet_tx_timeout); diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 0662e98fef72..26088feb6608 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -148,6 +148,9 @@ struct driver_info { struct sk_buff *(*tx_fixup)(struct usbnet *dev, struct sk_buff *skb, gfp_t flags); + /* recover from timeout */ + void (*recover)(struct usbnet *dev); + /* early initialization code, can sleep. This is for minidrivers * having 'subminidrivers' that need to do extra initialization * right after minidriver have initialized hardware. */ From c472ab68ad67db23c9907a27649b7dc0899b61f9 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Mon, 28 Jul 2014 10:56:36 +0200 Subject: [PATCH 424/455] cdc-ether: clean packet filter upon probe There are devices that don't do reset all the way. So the packet filter should be set to a sane initial value. Failure to do so leads to intermittent failures of DHCP on some systems under some conditions. Signed-off-by: Oliver Neukum Signed-off-by: David S. Miller --- drivers/net/usb/cdc_ether.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index 9ea4bfe5d318..2a32d9167d3b 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -341,6 +341,22 @@ next_desc: usb_driver_release_interface(driver, info->data); return -ENODEV; } + + /* Some devices don't initialise properly. In particular + * the packet filter is not reset. There are devices that + * don't do reset all the way. So the packet filter should + * be set to a sane initial value. + */ + usb_control_msg(dev->udev, + usb_sndctrlpipe(dev->udev, 0), + USB_CDC_SET_ETHERNET_PACKET_FILTER, + USB_TYPE_CLASS | USB_RECIP_INTERFACE, + USB_CDC_PACKET_TYPE_ALL_MULTICAST | USB_CDC_PACKET_TYPE_DIRECTED | USB_CDC_PACKET_TYPE_BROADCAST, + intf->cur_altsetting->desc.bInterfaceNumber, + NULL, + 0, + USB_CTRL_SET_TIMEOUT + ); return 0; bad_desc: From d92f5dec6325079c550889883af51db1b77d5623 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 28 Jul 2014 16:28:07 -0700 Subject: [PATCH 425/455] net: phy: re-apply PHY fixups during phy_register_device Commit 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()") moved the call to phy_scan_fixups() in phy_init_hw() after a software reset is performed. By the time phy_init_hw() is called in phy_device_register(), no driver has been bound to this PHY yet, so all the checks in phy_init_hw() against the PHY driver and the PHY driver's config_init function will return 0. We will therefore never call phy_scan_fixups() as we should. Fix this by calling phy_scan_fixups() and check for its return value to restore the intended functionality. This broke PHY drivers which do register an early PHY fixup callback to intercept the PHY probing and do things like changing the 32-bits unique PHY identifier when a pseudo-PHY address has been used, as well as board-specific PHY fixups that need to be applied during driver probe time. Reported-by: Hauke Merthens Reported-by: Jonas Gorski Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- drivers/net/phy/phy_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 74a82328dd49..22c57be4dfa0 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -355,7 +355,7 @@ int phy_device_register(struct phy_device *phydev) phydev->bus->phy_map[phydev->addr] = phydev; /* Run all of the fixups for this PHY */ - err = phy_init_hw(phydev); + err = phy_scan_fixups(phydev); if (err) { pr_err("PHY %d failed to initialize\n", phydev->addr); goto out; From b6328a07bd6b3d31b64f85864fe74f3b08c010ca Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 30 Jul 2014 00:23:09 +0200 Subject: [PATCH 426/455] ACPI / PNP: Fix acpi_pnp_match() The acpi_pnp_match() function is used for finding the ACPI device object that should be associated with the given PNP device. Unfortunately, the check used by that function is not strict enough and may cause success to be returned for a wrong ACPI device object. To fix that, use the observation that the pointer to the ACPI device object in question is already stored in the data field in struct pnp_dev, so acpi_pnp_match() can simply use that field to do its job. This problem was uncovered in 3.14 by commit 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace). Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace) Reported-and-tested-by: Vinson Lee Cc: 3.14+ # 3.14+ Signed-off-by: Rafael J. Wysocki --- drivers/pnp/pnpacpi/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/pnp/pnpacpi/core.c b/drivers/pnp/pnpacpi/core.c index b81448b2c75d..a5c6cb773e5f 100644 --- a/drivers/pnp/pnpacpi/core.c +++ b/drivers/pnp/pnpacpi/core.c @@ -319,8 +319,7 @@ static int __init acpi_pnp_match(struct device *dev, void *_pnp) struct pnp_dev *pnp = _pnp; /* true means it matched */ - return !acpi->physical_node_count - && compare_pnp_id(pnp->id, acpi_device_hid(acpi)); + return pnp->data == acpi; } static struct acpi_device * __init acpi_pnp_find_companion(struct device *dev) From 4972a74b888c6b52ca41fae6076786dbbeb746d5 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Tue, 15 Jul 2014 10:03:34 -0700 Subject: [PATCH 427/455] of: Split early_init_dt_scan into two parts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, early_init_dt_scan validates the header, sets the boot params, and scans for chosen/memory all in one function. Split this up into two separate functions (validation/setting boot params in one, scanning in another) to allow for additional setup between boot params and scanning the memory. Signed-off-by: Laura Abbott Tested-by: Andreas Färber [glikely: s/early_init_dt_scan_all/early_init_dt_scan_nodes/] Signed-off-by: Grant Likely --- drivers/of/fdt.c | 18 +++++++++++++++++- include/linux/of_fdt.h | 2 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index b777d8f46bd5..ecc7a02d868e 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -937,7 +937,7 @@ int __init __weak early_init_dt_reserve_memory_arch(phys_addr_t base, } #endif -bool __init early_init_dt_scan(void *params) +bool __init early_init_dt_verify(void *params) { if (!params) return false; @@ -951,6 +951,12 @@ bool __init early_init_dt_scan(void *params) return false; } + return true; +} + + +void __init early_init_dt_scan_nodes(void) +{ /* Retrieve various information from the /chosen node */ of_scan_flat_dt(early_init_dt_scan_chosen, boot_command_line); @@ -959,7 +965,17 @@ bool __init early_init_dt_scan(void *params) /* Setup memory, calling early_init_dt_add_memory_arch */ of_scan_flat_dt(early_init_dt_scan_memory, NULL); +} +bool __init early_init_dt_scan(void *params) +{ + bool status; + + status = early_init_dt_verify(params); + if (!status) + return false; + + early_init_dt_scan_nodes(); return true; } diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h index 05117899fcb4..ebb2449082fb 100644 --- a/include/linux/of_fdt.h +++ b/include/linux/of_fdt.h @@ -73,6 +73,8 @@ extern int early_init_dt_scan_root(unsigned long node, const char *uname, int depth, void *data); extern bool early_init_dt_scan(void *params); +extern bool early_init_dt_verify(void *params); +extern void early_init_dt_scan_nodes(void); extern const char *of_flat_dt_get_machine_name(void); extern const void *of_flat_dt_match_machine(const void *default_match, From 704033cee2e5b3c1c6eaf5bb398e465a9c3667b5 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Tue, 15 Jul 2014 10:03:35 -0700 Subject: [PATCH 428/455] of: Add memory limiting function for flattened devicetrees MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Buggy bootloaders may pass bogus memory entries in the devicetree. Add of_fdt_limit_memory to add an upper bound on the number of entries that can be present in the devicetree. Signed-off-by: Laura Abbott Tested-by: Andreas Färber Signed-off-by: Grant Likely --- drivers/of/fdt.c | 48 ++++++++++++++++++++++++++++++++++++++++++ include/linux/of_fdt.h | 1 + 2 files changed, 49 insertions(+) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index ecc7a02d868e..9aa012e6ea0a 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -26,6 +26,54 @@ #include /* for COMMAND_LINE_SIZE */ #include +/* + * of_fdt_limit_memory - limit the number of regions in the /memory node + * @limit: maximum entries + * + * Adjust the flattened device tree to have at most 'limit' number of + * memory entries in the /memory node. This function may be called + * any time after initial_boot_param is set. + */ +void of_fdt_limit_memory(int limit) +{ + int memory; + int len; + const void *val; + int nr_address_cells = OF_ROOT_NODE_ADDR_CELLS_DEFAULT; + int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT; + const uint32_t *addr_prop; + const uint32_t *size_prop; + int root_offset; + int cell_size; + + root_offset = fdt_path_offset(initial_boot_params, "/"); + if (root_offset < 0) + return; + + addr_prop = fdt_getprop(initial_boot_params, root_offset, + "#address-cells", NULL); + if (addr_prop) + nr_address_cells = fdt32_to_cpu(*addr_prop); + + size_prop = fdt_getprop(initial_boot_params, root_offset, + "#size-cells", NULL); + if (size_prop) + nr_size_cells = fdt32_to_cpu(*size_prop); + + cell_size = sizeof(uint32_t)*(nr_address_cells + nr_size_cells); + + memory = fdt_path_offset(initial_boot_params, "/memory"); + if (memory > 0) { + val = fdt_getprop(initial_boot_params, memory, "reg", &len); + if (len > limit*cell_size) { + len = limit*cell_size; + pr_debug("Limiting number of entries to %d\n", limit); + fdt_setprop(initial_boot_params, memory, "reg", val, + len); + } + } +} + /** * of_fdt_is_compatible - Return true if given node from the given blob has * compat in its compatible list diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h index ebb2449082fb..0ff360d5b3b3 100644 --- a/include/linux/of_fdt.h +++ b/include/linux/of_fdt.h @@ -86,6 +86,7 @@ extern void unflatten_and_copy_device_tree(void); extern void early_init_devtree(void *); extern void early_get_first_memblock_info(void *, phys_addr_t *); extern u64 fdt_translate_address(const void *blob, int node_offset); +extern void of_fdt_limit_memory(int limit); #else /* CONFIG_OF_FLATTREE */ static inline void early_init_fdt_scan_reserved_mem(void) {} static inline const char *of_flat_dt_get_machine_name(void) { return NULL; } From 5a12a597a8627b91fd9d94365853f9f69a4f399c Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Tue, 15 Jul 2014 10:03:36 -0700 Subject: [PATCH 429/455] arm: Add devicetree fixup machine function MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76 (ARM: 8025/1: Get rid of meminfo) dropped the upper bound on the number of memory banks that can be added as there was no technical need in the kernel. It turns out though, some bootloaders (specifically the arndale-octa exynos boards) may pass invalid memory information and rely on the kernel to not parse this data. This is a bug in the bootloader but we still need to work around this. Work around this by introducing a dt_fixup function. This function gets called before the flattened devicetree is scanned for memory and the like. In this fixup function for exynos, limit the maximum number of memory regions in the devicetree. Signed-off-by: Laura Abbott Tested-by: Andreas Färber [glikely: Added a comment and fixed up function name] Signed-off-by: Grant Likely --- arch/arm/include/asm/mach/arch.h | 1 + arch/arm/kernel/devtree.c | 8 +++++++- arch/arm/mach-exynos/exynos.c | 10 ++++++++++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h index 060a75e99263..0406cb3f1af7 100644 --- a/arch/arm/include/asm/mach/arch.h +++ b/arch/arm/include/asm/mach/arch.h @@ -50,6 +50,7 @@ struct machine_desc { struct smp_operations *smp; /* SMP operations */ bool (*smp_init)(void); void (*fixup)(struct tag *, char **); + void (*dt_fixup)(void); void (*init_meminfo)(void); void (*reserve)(void);/* reserve mem blocks */ void (*map_io)(void);/* IO mapping function */ diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c index e94a157ddff1..11c54de9f8cf 100644 --- a/arch/arm/kernel/devtree.c +++ b/arch/arm/kernel/devtree.c @@ -212,7 +212,7 @@ const struct machine_desc * __init setup_machine_fdt(unsigned int dt_phys) mdesc_best = &__mach_desc_GENERIC_DT; #endif - if (!dt_phys || !early_init_dt_scan(phys_to_virt(dt_phys))) + if (!dt_phys || !early_init_dt_verify(phys_to_virt(dt_phys))) return NULL; mdesc = of_flat_dt_match_machine(mdesc_best, arch_get_next_mach); @@ -237,6 +237,12 @@ const struct machine_desc * __init setup_machine_fdt(unsigned int dt_phys) dump_machine_table(); /* does not return */ } + /* We really don't want to do this, but sometimes firmware provides buggy data */ + if (mdesc->dt_fixup) + mdesc->dt_fixup(); + + early_init_dt_scan_nodes(); + /* Change machine number to match the mdesc we're using */ __machine_arch_type = mdesc->nr; diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c index 46d893fcbe85..66c9b9614f3c 100644 --- a/arch/arm/mach-exynos/exynos.c +++ b/arch/arm/mach-exynos/exynos.c @@ -335,6 +335,15 @@ static void __init exynos_reserve(void) #endif } +static void __init exynos_dt_fixup(void) +{ + /* + * Some versions of uboot pass garbage entries in the memory node, + * use the old CONFIG_ARM_NR_BANKS + */ + of_fdt_limit_memory(8); +} + DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)") /* Maintainer: Thomas Abraham */ /* Maintainer: Kukjin Kim */ @@ -348,4 +357,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)") .dt_compat = exynos_dt_compat, .restart = exynos_restart, .reserve = exynos_reserve, + .dt_fixup = exynos_dt_fixup, MACHINE_END From 63afbe7a0ac184ef8485dac4914e87b211b5bfaa Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 25 Jul 2014 16:29:12 +0100 Subject: [PATCH 430/455] kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform If the physical address of GICV isn't page-aligned, then we end up creating a stage-2 mapping of the page containing it, which causes us to map neighbouring memory locations directly into the guest. As an example, consider a platform with GICV at physical 0x2c02f000 running a 64k-page host kernel. If qemu maps this into the guest at 0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will map host physical region 0x2c020000 - 0x2c02efff. Accesses to these physical regions may cause UNPREDICTABLE behaviour, for example, on the Juno platform this will cause an SError exception to EL3, which brings down the entire physical CPU resulting in RCU stalls / HYP panics / host crashing / wasted weeks of debugging. SBSA recommends that systems alias the 4k GICV across the bounding 64k region, in which case GICV physical could be described as 0x2c020000 in the above scenario. This patch fixes the problem by failing the vgic probe if the physical base address or the size of GICV aren't page-aligned. Note that this generated a warning in dmesg about freeing enabled IRQs, so I had to move the IRQ enabling later in the probe. Cc: Christoffer Dall Cc: Marc Zyngier Cc: Gleb Natapov Cc: Paolo Bonzini Cc: Joel Schopp Cc: Don Dutile Acked-by: Peter Maydell Acked-by: Joel Schopp Acked-by: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Christoffer Dall --- virt/kvm/arm/vgic.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 56ff9bebb577..476d3bf540a8 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1526,17 +1526,33 @@ int kvm_vgic_hyp_init(void) goto out_unmap; } - kvm_info("%s@%llx IRQ%d\n", vgic_node->name, - vctrl_res.start, vgic_maint_irq); - on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); - if (of_address_to_resource(vgic_node, 3, &vcpu_res)) { kvm_err("Cannot obtain VCPU resource\n"); ret = -ENXIO; goto out_unmap; } + + if (!PAGE_ALIGNED(vcpu_res.start)) { + kvm_err("GICV physical address 0x%llx not page aligned\n", + (unsigned long long)vcpu_res.start); + ret = -ENXIO; + goto out_unmap; + } + + if (!PAGE_ALIGNED(resource_size(&vcpu_res))) { + kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n", + (unsigned long long)resource_size(&vcpu_res), + PAGE_SIZE); + ret = -ENXIO; + goto out_unmap; + } + vgic_vcpu_base = vcpu_res.start; + kvm_info("%s@%llx IRQ%d\n", vgic_node->name, + vctrl_res.start, vgic_maint_irq); + on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); + goto out; out_unmap: From b7dd0e350e0bd4c0fddcc9b8958342700b00b168 Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Fri, 11 Jul 2014 16:42:34 +0100 Subject: [PATCH 431/455] x86/xen: safely map and unmap grant frames when in atomic context arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in atomic context but were calling alloc_vm_area() which might sleep. Also, if a driver attempts to allocate a grant ref from an interrupt and the table needs expanding, then the CPU may already by in lazy MMU mode and apply_to_page_range() will BUG when it tries to re-enable lazy MMU mode. These two functions are only used in PV guests. Introduce arch_gnttab_init() to allocates the virtual address space in advance. Avoid the use of apply_to_page_range() by using saving and using the array of PTE addresses from the alloc_vm_area() call (which ensures that the required page tables are pre-allocated). Signed-off-by: David Vrabel Signed-off-by: Konrad Rzeszutek Wilk --- arch/arm/xen/grant-table.c | 5 ++ arch/x86/xen/grant-table.c | 150 +++++++++++++++++++++++-------------- drivers/xen/grant-table.c | 9 ++- include/xen/grant_table.h | 1 + 4 files changed, 106 insertions(+), 59 deletions(-) diff --git a/arch/arm/xen/grant-table.c b/arch/arm/xen/grant-table.c index 859a9bb002d5..91cf08ba1e95 100644 --- a/arch/arm/xen/grant-table.c +++ b/arch/arm/xen/grant-table.c @@ -51,3 +51,8 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes, { return -ENOSYS; } + +int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status) +{ + return 0; +} diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c index c98583588580..ebfa9b2c871d 100644 --- a/arch/x86/xen/grant-table.c +++ b/arch/x86/xen/grant-table.c @@ -36,99 +36,133 @@ #include #include +#include #include #include #include #include +#include #include -static int map_pte_fn(pte_t *pte, struct page *pmd_page, - unsigned long addr, void *data) -{ - unsigned long **frames = (unsigned long **)data; - - set_pte_at(&init_mm, addr, pte, mfn_pte((*frames)[0], PAGE_KERNEL)); - (*frames)++; - return 0; -} - -/* - * This function is used to map shared frames to store grant status. It is - * different from map_pte_fn above, the frames type here is uint64_t. - */ -static int map_pte_fn_status(pte_t *pte, struct page *pmd_page, - unsigned long addr, void *data) -{ - uint64_t **frames = (uint64_t **)data; - - set_pte_at(&init_mm, addr, pte, mfn_pte((*frames)[0], PAGE_KERNEL)); - (*frames)++; - return 0; -} - -static int unmap_pte_fn(pte_t *pte, struct page *pmd_page, - unsigned long addr, void *data) -{ - - set_pte_at(&init_mm, addr, pte, __pte(0)); - return 0; -} +static struct gnttab_vm_area { + struct vm_struct *area; + pte_t **ptes; +} gnttab_shared_vm_area, gnttab_status_vm_area; int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes, unsigned long max_nr_gframes, void **__shared) { - int rc; void *shared = *__shared; + unsigned long addr; + unsigned long i; - if (shared == NULL) { - struct vm_struct *area = - alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL); - BUG_ON(area == NULL); - shared = area->addr; - *__shared = shared; + if (shared == NULL) + *__shared = shared = gnttab_shared_vm_area.area->addr; + + addr = (unsigned long)shared; + + for (i = 0; i < nr_gframes; i++) { + set_pte_at(&init_mm, addr, gnttab_shared_vm_area.ptes[i], + mfn_pte(frames[i], PAGE_KERNEL)); + addr += PAGE_SIZE; } - rc = apply_to_page_range(&init_mm, (unsigned long)shared, - PAGE_SIZE * nr_gframes, - map_pte_fn, &frames); - return rc; + return 0; } int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes, unsigned long max_nr_gframes, grant_status_t **__shared) { - int rc; grant_status_t *shared = *__shared; + unsigned long addr; + unsigned long i; - if (shared == NULL) { - /* No need to pass in PTE as we are going to do it - * in apply_to_page_range anyhow. */ - struct vm_struct *area = - alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL); - BUG_ON(area == NULL); - shared = area->addr; - *__shared = shared; + if (shared == NULL) + *__shared = shared = gnttab_status_vm_area.area->addr; + + addr = (unsigned long)shared; + + for (i = 0; i < nr_gframes; i++) { + set_pte_at(&init_mm, addr, gnttab_status_vm_area.ptes[i], + mfn_pte(frames[i], PAGE_KERNEL)); + addr += PAGE_SIZE; } - rc = apply_to_page_range(&init_mm, (unsigned long)shared, - PAGE_SIZE * nr_gframes, - map_pte_fn_status, &frames); - return rc; + return 0; } void arch_gnttab_unmap(void *shared, unsigned long nr_gframes) { - apply_to_page_range(&init_mm, (unsigned long)shared, - PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL); + pte_t **ptes; + unsigned long addr; + unsigned long i; + + if (shared == gnttab_status_vm_area.area->addr) + ptes = gnttab_status_vm_area.ptes; + else + ptes = gnttab_shared_vm_area.ptes; + + addr = (unsigned long)shared; + + for (i = 0; i < nr_gframes; i++) { + set_pte_at(&init_mm, addr, ptes[i], __pte(0)); + addr += PAGE_SIZE; + } } + +static int arch_gnttab_valloc(struct gnttab_vm_area *area, unsigned nr_frames) +{ + area->ptes = kmalloc(sizeof(pte_t *) * nr_frames, GFP_KERNEL); + if (area->ptes == NULL) + return -ENOMEM; + + area->area = alloc_vm_area(PAGE_SIZE * nr_frames, area->ptes); + if (area->area == NULL) { + kfree(area->ptes); + return -ENOMEM; + } + + return 0; +} + +static void arch_gnttab_vfree(struct gnttab_vm_area *area) +{ + free_vm_area(area->area); + kfree(area->ptes); +} + +int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status) +{ + int ret; + + if (!xen_pv_domain()) + return 0; + + ret = arch_gnttab_valloc(&gnttab_shared_vm_area, nr_shared); + if (ret < 0) + return ret; + + /* + * Always allocate the space for the status frames in case + * we're migrated to a host with V2 support. + */ + ret = arch_gnttab_valloc(&gnttab_status_vm_area, nr_status); + if (ret < 0) + goto err; + + return 0; + err: + arch_gnttab_vfree(&gnttab_shared_vm_area); + return -ENOMEM; +} + #ifdef CONFIG_XEN_PVH #include #include -#include #include static int __init xlated_setup_gnttab_pages(void) { diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 5d4de88fe5b8..eeba7544f0cd 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -1195,18 +1195,20 @@ static int gnttab_expand(unsigned int req_entries) int gnttab_init(void) { int i; + unsigned long max_nr_grant_frames; unsigned int max_nr_glist_frames, nr_glist_frames; unsigned int nr_init_grefs; int ret; gnttab_request_version(); + max_nr_grant_frames = gnttab_max_grant_frames(); nr_grant_frames = 1; /* Determine the maximum number of frames required for the * grant reference free list on the current hypervisor. */ BUG_ON(grefs_per_grant_frame == 0); - max_nr_glist_frames = (gnttab_max_grant_frames() * + max_nr_glist_frames = (max_nr_grant_frames * grefs_per_grant_frame / RPP); gnttab_list = kmalloc(max_nr_glist_frames * sizeof(grant_ref_t *), @@ -1223,6 +1225,11 @@ int gnttab_init(void) } } + ret = arch_gnttab_init(max_nr_grant_frames, + nr_status_frames(max_nr_grant_frames)); + if (ret < 0) + goto ini_nomem; + if (gnttab_setup() < 0) { ret = -ENODEV; goto ini_nomem; diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index a5af2a26d94f..5c1aba154b64 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -170,6 +170,7 @@ gnttab_set_unmap_op(struct gnttab_unmap_grant_ref *unmap, phys_addr_t addr, unmap->dev_bus_addr = 0; } +int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status); int arch_gnttab_map_shared(xen_pfn_t *frames, unsigned long nr_gframes, unsigned long max_nr_gframes, void **__shared); From 1d8fcba1de632d7a43349788ad534c5a32c5a44c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 30 Jul 2014 08:56:23 -0700 Subject: [PATCH 432/455] Revert "cdc_subset: deal with a device that needs reset for timeout" This reverts commit 20fbe3ae990fd54fc7d1f889d61958bc8b38f254. As reported by Stephen Rothwell, it causes compile failures in certain configurations: drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function) .pre_reset = dummy_prereset, ^ drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function) .post_reset = dummy_postreset, ^ Reported-by: Stephen Rothwell Acked-by: David Miller Cc: Oliver Neukum Signed-off-by: Linus Torvalds --- drivers/net/usb/cdc_subset.c | 27 --------------------------- drivers/net/usb/usbnet.c | 8 ++------ include/linux/usb/usbnet.h | 3 --- 3 files changed, 2 insertions(+), 36 deletions(-) diff --git a/drivers/net/usb/cdc_subset.c b/drivers/net/usb/cdc_subset.c index 3ef411efd86e..91f0919fe278 100644 --- a/drivers/net/usb/cdc_subset.c +++ b/drivers/net/usb/cdc_subset.c @@ -85,34 +85,9 @@ static int always_connected (struct usbnet *dev) * *-------------------------------------------------------------------------*/ -static void m5632_recover(struct usbnet *dev) -{ - struct usb_device *udev = dev->udev; - struct usb_interface *intf = dev->intf; - int r; - - r = usb_lock_device_for_reset(udev, intf); - if (r < 0) - return; - - usb_reset_device(udev); - usb_unlock_device(udev); -} - -static int dummy_prereset(struct usb_interface *intf) -{ - return 0; -} - -static int dummy_postreset(struct usb_interface *intf) -{ - return 0; -} - static const struct driver_info ali_m5632_info = { .description = "ALi M5632", .flags = FLAG_POINTTOPOINT, - .recover = m5632_recover, }; #endif @@ -357,8 +332,6 @@ static struct usb_driver cdc_subset_driver = { .probe = usbnet_probe, .suspend = usbnet_suspend, .resume = usbnet_resume, - .pre_reset = dummy_prereset, - .post_reset = dummy_postreset, .disconnect = usbnet_disconnect, .id_table = products, .disable_hub_initiated_lpm = 1, diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 5173821a9575..f9e96c427558 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1218,12 +1218,8 @@ void usbnet_tx_timeout (struct net_device *net) unlink_urbs (dev, &dev->txq); tasklet_schedule (&dev->bh); - /* this needs to be handled individually because the generic layer - * doesn't know what is sufficient and could not restore private - * information if a remedy of an unconditional reset were used. - */ - if (dev->driver_info->recover) - (dev->driver_info->recover)(dev); + + // FIXME: device recovery -- reset? } EXPORT_SYMBOL_GPL(usbnet_tx_timeout); diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 26088feb6608..0662e98fef72 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -148,9 +148,6 @@ struct driver_info { struct sk_buff *(*tx_fixup)(struct usbnet *dev, struct sk_buff *skb, gfp_t flags); - /* recover from timeout */ - void (*recover)(struct usbnet *dev); - /* early initialization code, can sleep. This is for minidrivers * having 'subminidrivers' that need to do extra initialization * right after minidriver have initialized hardware. */ From 3181788c3ac3a78d43a41b74d77f348b4855627e Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 25 Jul 2014 12:18:42 +0200 Subject: [PATCH 433/455] ARM: nomadik: fix up double inversion in DT The GPIO pin connected to card detect was inverted twice: once by the argument to the GPIO line itself where it was magically marked as active low by the flag GPIO_ACTIVE_LOW (0x01) in the third cell, and also marked active low AGAIN by explicitly stating "cd-inverted" (a deprecated method). After commit 78f87df2b4f8760954d7d80603d0cfcbd4759683 "mmc: mmci: Use the common mmc DT parser" this results in the line being inverted twice so it was effectively uninverted, while the old code would not have this effect, instead disregarding the flag on the GPIO line altogether, which is a bug. I admit the semantics may be unclear but inverting twice is as good a definition as any on how this should work. So fix up the buggy device tree. Use proper #includes so the DTS is clear and readable. Cc: Ulf Hansson Signed-off-by: Linus Walleij Signed-off-by: Olof Johansson --- arch/arm/boot/dts/ste-nomadik-s8815.dts | 2 +- arch/arm/boot/dts/ste-nomadik-stn8815.dtsi | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/arm/boot/dts/ste-nomadik-s8815.dts b/arch/arm/boot/dts/ste-nomadik-s8815.dts index f557feb997f4..90d8b6c7a205 100644 --- a/arch/arm/boot/dts/ste-nomadik-s8815.dts +++ b/arch/arm/boot/dts/ste-nomadik-s8815.dts @@ -4,7 +4,7 @@ */ /dts-v1/; -/include/ "ste-nomadik-stn8815.dtsi" +#include "ste-nomadik-stn8815.dtsi" / { model = "Calao Systems USB-S8815"; diff --git a/arch/arm/boot/dts/ste-nomadik-stn8815.dtsi b/arch/arm/boot/dts/ste-nomadik-stn8815.dtsi index d316c955bd5f..dbcf521b017f 100644 --- a/arch/arm/boot/dts/ste-nomadik-stn8815.dtsi +++ b/arch/arm/boot/dts/ste-nomadik-stn8815.dtsi @@ -1,7 +1,9 @@ /* * Device Tree for the ST-Ericsson Nomadik 8815 STn8815 SoC */ -/include/ "skeleton.dtsi" + +#include +#include "skeleton.dtsi" / { #address-cells = <1>; @@ -842,8 +844,7 @@ bus-width = <4>; cap-mmc-highspeed; cap-sd-highspeed; - cd-gpios = <&gpio3 15 0x1>; - cd-inverted; + cd-gpios = <&gpio3 15 GPIO_ACTIVE_LOW>; pinctrl-names = "default"; pinctrl-0 = <&mmcsd_default_mux>, <&mmcsd_default_mode>; vmmc-supply = <&vmmc_regulator>; From b779b88df8370feafa6f49d08a2cc88db4834992 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andreas=20F=C3=A4rber?= Date: Mon, 28 Jul 2014 12:06:26 -0600 Subject: [PATCH 434/455] MAINTAINERS: Update Tegra Git URL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit swarren/linux-tegra.git is a stale location; it has moved to tegra/linux.git. While the git protocol re-directs to the new location, HTTP does not. Besides, MAINTAINERS should contain the canonical URL. Signed-off-by: Andreas Färber [swarren, updated commit message] Signed-off-by: Stephen Warren Signed-off-by: Olof Johansson --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 61a8f486306b..82afdc7843c5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8906,7 +8906,7 @@ M: Stephen Warren M: Thierry Reding L: linux-tegra@vger.kernel.org Q: http://patchwork.ozlabs.org/project/linux-tegra/list/ -T: git git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git S: Supported N: [^a-z]tegra From f6789593d5cea42a4ecb1cbeab6a23ade5ebbba7 Mon Sep 17 00:00:00 2001 From: Maxim Patlasov Date: Wed, 30 Jul 2014 16:08:21 -0700 Subject: [PATCH 435/455] mm/page-writeback.c: fix divide by zero in bdi_dirty_limits() Under memory pressure, it is possible for dirty_thresh, calculated by global_dirty_limits() in balance_dirty_pages(), to equal zero. Then, if strictlimit is true, bdi_dirty_limits() tries to resolve the proportion: bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh by dividing by zero. Signed-off-by: Maxim Patlasov Acked-by: Rik van Riel Cc: Michal Hocko Cc: KOSAKI Motohiro Cc: Wu Fengguang Cc: Johannes Weiner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page-writeback.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 518e2c3f4c75..e0c943014eb7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1306,9 +1306,9 @@ static inline void bdi_dirty_limits(struct backing_dev_info *bdi, *bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh); if (bdi_bg_thresh) - *bdi_bg_thresh = div_u64((u64)*bdi_thresh * - background_thresh, - dirty_thresh); + *bdi_bg_thresh = dirty_thresh ? div_u64((u64)*bdi_thresh * + background_thresh, + dirty_thresh) : 0; /* * In order to avoid the stacked BDI deadlock we need From b104a35d32025ca740539db2808aa3385d0f30eb Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Wed, 30 Jul 2014 16:08:24 -0700 Subject: [PATCH 436/455] mm, thp: do not allow thp faults to avoid cpuset restrictions The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET should be set in allocflags. ALLOC_CPUSET controls if a page allocation should be restricted only to the set of allowed cpuset mems. Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent the fault path from using memory compaction or direct reclaim. Thus, it is unfairly able to allocate outside of its cpuset mems restriction as a side-effect. This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is truly GFP_ATOMIC by verifying it is also not a thp allocation. Signed-off-by: David Rientjes Reported-by: Alex Thorlton Tested-by: Alex Thorlton Cc: Bob Liu Cc: Dave Hansen Cc: Hedi Berriche Cc: Hugh Dickins Cc: Johannes Weiner Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Rik van Riel Cc: Srivatsa S. Bhat Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8bcfe3ae20cb..ef44ad736ca1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2447,7 +2447,7 @@ static inline int gfp_to_alloc_flags(gfp_t gfp_mask) { int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; - const gfp_t wait = gfp_mask & __GFP_WAIT; + const bool atomic = !(gfp_mask & (__GFP_WAIT | __GFP_NO_KSWAPD)); /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); @@ -2456,20 +2456,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (!wait) and ALLOC_HIGH (__GFP_HIGH). + * set both ALLOC_HARDER (atomic == true) and ALLOC_HIGH (__GFP_HIGH). */ alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); - if (!wait) { + if (atomic) { /* - * Not worth trying to allocate harder for - * __GFP_NOMEMALLOC even if it can't schedule. + * Not worth trying to allocate harder for __GFP_NOMEMALLOC even + * if it can't schedule. */ - if (!(gfp_mask & __GFP_NOMEMALLOC)) + if (!(gfp_mask & __GFP_NOMEMALLOC)) alloc_flags |= ALLOC_HARDER; /* - * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc. - * See also cpuset_zone_allowed() comment in kernel/cpuset.c. + * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the + * comment for __cpuset_node_allowed_softwall(). */ alloc_flags &= ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && !in_interrupt()) From 0193ed8225e1a79ed64632106ec3cc81798cb13c Mon Sep 17 00:00:00 2001 From: Alexandre Bounine Date: Wed, 30 Jul 2014 16:08:26 -0700 Subject: [PATCH 437/455] rapidio/tsi721_dma: fix failure to obtain transaction descriptor This is a bug fix for the situation when function tsi721_desc_get() fails to obtain a free transaction descriptor. The bug usually results in a memory access crash dump when data transfer scatter-gather list has more entries than size of hardware buffer descriptors ring. This fix ensures that error is properly returned to a caller instead of an invalid entry. This patch is applicable to kernel versions starting from v3.5. Signed-off-by: Alexandre Bounine Cc: Matt Porter Cc: Andre van Herk Cc: Stef van Os Cc: Vinod Koul Cc: Dan Williams Cc: [3.5+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rapidio/devices/tsi721_dma.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c index 9b60b1f3261c..44341dc5b148 100644 --- a/drivers/rapidio/devices/tsi721_dma.c +++ b/drivers/rapidio/devices/tsi721_dma.c @@ -287,6 +287,12 @@ struct tsi721_tx_desc *tsi721_desc_get(struct tsi721_bdma_chan *bdma_chan) "desc %p not ACKed\n", tx_desc); } + if (ret == NULL) { + dev_dbg(bdma_chan->dchan.device->dev, + "%s: unable to obtain tx descriptor\n", __func__); + goto err_out; + } + i = bdma_chan->wr_count_next % bdma_chan->bd_num; if (i == bdma_chan->bd_num - 1) { i = 0; @@ -297,7 +303,7 @@ struct tsi721_tx_desc *tsi721_desc_get(struct tsi721_bdma_chan *bdma_chan) tx_desc->txd.phys = bdma_chan->bd_phys + i * sizeof(struct tsi721_dma_desc); tx_desc->hw_desc = &((struct tsi721_dma_desc *)bdma_chan->bd_base)[i]; - +err_out: spin_unlock_bh(&bdma_chan->lock); return ret; From 93a9eb39fad1b5fc9077776caa3af207883b254d Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Wed, 30 Jul 2014 16:08:28 -0700 Subject: [PATCH 438/455] hwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings() A recent fix from Chen Yucong, commit 0bc1f8b0682c ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU") rejects going into unmapping operation for hugetlbfs/thp pages, which results in failing error containing on such pages. This patch fixes it. With this patch, hwpoison functional tests in mce-test testsuite pass. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Chen Yucong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7211a73ba14d..3db261fdee4c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -895,7 +895,13 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn, struct page *hpage = *hpagep; struct page *ppage; - if (PageReserved(p) || PageSlab(p) || !PageLRU(p)) + /* + * Here we are interested only in user-mapped pages, so skip any + * other types of pages. + */ + if (PageReserved(p) || PageSlab(p)) + return SWAP_SUCCESS; + if (!(PageLRU(hpage) || PageHuge(p))) return SWAP_SUCCESS; /* From 52089b14c08f6ee9886c40b031712383fa24ddd9 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Wed, 30 Jul 2014 16:08:30 -0700 Subject: [PATCH 439/455] hwpoison: call action_result() in failure path of hwpoison_user_mappings() hwpoison_user_mappings() could fail for various reasons, so printk()s to print out the reasons should be done in each failure check inside hwpoison_user_mappings(). And currently we don't call action_result() when hwpoison_user_mappings() fails, which is not consistent with other exit points of memory error handler. So this patch fixes these messaging problems. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: Chen Yucong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 3db261fdee4c..a013bc94ebbe 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -911,8 +911,10 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn, if (!page_mapped(hpage)) return SWAP_SUCCESS; - if (PageKsm(p)) + if (PageKsm(p)) { + pr_err("MCE %#lx: can't handle KSM pages.\n", pfn); return SWAP_FAIL; + } if (PageSwapCache(p)) { printk(KERN_ERR @@ -1235,7 +1237,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags) */ if (hwpoison_user_mappings(p, pfn, trapno, flags, &hpage) != SWAP_SUCCESS) { - printk(KERN_ERR "MCE %#lx: cannot unmap page, give up\n", pfn); + action_result(pfn, "unmapping failed", IGNORED); res = -EBUSY; goto out; } From 2bcf2e92c3918ce62ab4e934256e47e9a16d19c3 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 30 Jul 2014 16:08:33 -0700 Subject: [PATCH 440/455] memcg: oom_notify use-after-free fix Paul Furtado has reported the following GPF: general protection fault: 0000 [#1] SMP Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1 task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000 RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240 RSP: e02b:ffff8801d2ec7d48 EFLAGS: 00010283 RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200 RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800 R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30 FS: 00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660 Call Trace: pagefault_out_of_memory+0x18/0x90 mm_fault_error+0xa9/0x1a0 __do_page_fault+0x478/0x4c0 do_page_fault+0x2c/0x40 page_fault+0x28/0x30 Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75 RIP mem_cgroup_oom_synchronize+0x140/0x240 Commit fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock assuming it is protected by the hierarchical OOM-lock. Although this is true for the notification part the protection doesn't cover unregistration of event which can happen in parallel now so mem_cgroup_oom_notify can see already unlinked and/or freed mem_cgroup_eventfd_list. Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881 Fixes: fb2a6fc56be6 (mm: memcg: rework and document OOM waiting and wakeup) Signed-off-by: Michal Hocko Reported-by: Paul Furtado Tested-by: Paul Furtado Acked-by: Johannes Weiner Cc: [3.12+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a2c7bcb0e6eb..1f14a430c656 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5415,8 +5415,12 @@ static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) { struct mem_cgroup_eventfd_list *ev; + spin_lock(&memcg_oom_lock); + list_for_each_entry(ev, &memcg->oom_notify, list) eventfd_signal(ev->eventfd, 1); + + spin_unlock(&memcg_oom_lock); return 0; } From b4903d6e8408e6137ee4666ee67ec566b74a0f05 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Wed, 30 Jul 2014 16:08:35 -0700 Subject: [PATCH 441/455] mm: debugfs: move rounddown_pow_of_two() out from do_fault path do_fault_around() expects fault_around_bytes rounded down to nearest page order. Instead of calling rounddown_pow_of_two every time in fault_around_pages()/fault_around_mask() we could do round down when user changes fault_around_bytes via debugfs interface. This also fixes bug when user set fault_around_bytes to 0. Result of rounddown_pow_of_two(0) is not defined, therefore fault_around_bytes == 0 doesn't work without this patch. Let's set fault_around_bytes to PAGE_SIZE if user sets to something less than PAGE_SIZE [akpm@linux-foundation.org: tweak code layout] Fixes: a9b0f861("mm: nominate faultaround area in bytes rather than page order") Signed-off-by: Andrey Ryabinin Reported-by: Sasha Levin Acked-by: Kirill A. Shutemov Cc: [3.15.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7e8d8205b610..8b44f765b645 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2758,23 +2758,18 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } -static unsigned long fault_around_bytes = 65536; +static unsigned long fault_around_bytes = rounddown_pow_of_two(65536); -/* - * fault_around_pages() and fault_around_mask() round down fault_around_bytes - * to nearest page order. It's what do_fault_around() expects to see. - */ static inline unsigned long fault_around_pages(void) { - return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; + return fault_around_bytes >> PAGE_SHIFT; } static inline unsigned long fault_around_mask(void) { - return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; + return ~(fault_around_bytes - 1) & PAGE_MASK; } - #ifdef CONFIG_DEBUG_FS static int fault_around_bytes_get(void *data, u64 *val) { @@ -2782,11 +2777,19 @@ static int fault_around_bytes_get(void *data, u64 *val) return 0; } +/* + * fault_around_pages() and fault_around_mask() expects fault_around_bytes + * rounded down to nearest page order. It's what do_fault_around() expects to + * see. + */ static int fault_around_bytes_set(void *data, u64 val) { if (val / PAGE_SIZE > PTRS_PER_PTE) return -EINVAL; - fault_around_bytes = val; + if (val > PAGE_SIZE) + fault_around_bytes = rounddown_pow_of_two(val); + else + fault_around_bytes = PAGE_SIZE; /* rounddown_pow_of_two(0) is undefined */ return 0; } DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, From 75325189c9e4df5f67eec5ec5b0d90759084887b Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Wed, 30 Jul 2014 16:08:37 -0700 Subject: [PATCH 442/455] mm: fix filemap.c pagecache_get_page() kernel-doc warnings Fix kernel-doc warnings in mm/filemap.c: pagecache_get_page(): Warning(..//mm/filemap.c:1054): No description found for parameter 'cache_gfp_mask' Warning(..//mm/filemap.c:1054): No description found for parameter 'radix_gfp_mask' Warning(..//mm/filemap.c:1054): Excess function parameter 'gfp_mask' description in 'pagecache_get_page' Fixes: 2457aec63745 ("mm: non-atomically mark page accessed during page cache allocation where possible") [mgorman@suse.de: change everything] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Randy Dunlap Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index dafb06f70a09..900edfaf6df5 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1031,18 +1031,21 @@ EXPORT_SYMBOL(find_lock_entry); * @mapping: the address_space to search * @offset: the page index * @fgp_flags: PCG flags - * @gfp_mask: gfp mask to use if a page is to be allocated + * @cache_gfp_mask: gfp mask to use for the page cache data page allocation + * @radix_gfp_mask: gfp mask to use for radix tree node allocation * * Looks up the page cache slot at @mapping & @offset. * - * PCG flags modify how the page is returned + * PCG flags modify how the page is returned. * * FGP_ACCESSED: the page will be marked accessed * FGP_LOCK: Page is return locked * FGP_CREAT: If page is not present then a new page is allocated using - * @gfp_mask and added to the page cache and the VM's LRU - * list. The page is returned locked and with an increased - * refcount. Otherwise, %NULL is returned. + * @cache_gfp_mask and added to the page cache and the VM's LRU + * list. If radix tree nodes are allocated during page cache + * insertion then @radix_gfp_mask is used. The page is returned + * locked and with an increased refcount. Otherwise, %NULL is + * returned. * * If FGP_LOCK or FGP_CREAT are specified then the function may sleep even * if the GFP flags specified for FGP_CREAT are atomic. From 8f1d26d0e59b9676587c54578f976709b625d6e9 Mon Sep 17 00:00:00 2001 From: Atsushi Kumagai Date: Wed, 30 Jul 2014 16:08:39 -0700 Subject: [PATCH 443/455] kexec: export free_huge_page to VMCOREINFO PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1 ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need another symbol to filter *hugetlbfs* pages. If a user hope to filter user pages, makedumpfile tries to exclude them by checking the condition whether the page is anonymous, but hugetlbfs pages aren't anonymous while they also be user pages. We know it's possible to detect them in the same way as PageHuge(), so we need the start address of free_huge_page(): int PageHuge(struct page *page) { if (!PageCompound(page)) return 0; page = compound_head(page); return get_compound_page_dtor(page) == free_huge_page; } For that reason, this patch changes free_huge_page() into public to export it to VMCOREINFO. Signed-off-by: Atsushi Kumagai Acked-by: Baoquan He Cc: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 1 + kernel/kexec.c | 2 ++ mm/hugetlb.c | 2 +- 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 255cd5cc0754..a23c096b3080 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -80,6 +80,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page); bool isolate_huge_page(struct page *page, struct list_head *list); void putback_active_hugepage(struct page *page); bool is_hugepage_active(struct page *page); +void free_huge_page(struct page *page); #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); diff --git a/kernel/kexec.c b/kernel/kexec.c index 369f41a94124..23a088fec3c0 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -1619,6 +1620,7 @@ static int __init crash_save_vmcoreinfo_init(void) #endif VMCOREINFO_NUMBER(PG_head_mask); VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); + VMCOREINFO_SYMBOL(free_huge_page); arch_crash_save_vmcoreinfo(); update_vmcoreinfo_note(); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9221c02ed9e2..7a0a73d2fcff 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -856,7 +856,7 @@ struct hstate *size_to_hstate(unsigned long size) return NULL; } -static void free_huge_page(struct page *page) +void free_huge_page(struct page *page) { /* * Can't pass hstate in here because it is called from the From e0198b290dcd8313bdf313a0d083033d5c01d761 Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Wed, 30 Jul 2014 16:08:42 -0700 Subject: [PATCH 444/455] Josh has moved My IBM email addresses haven't worked for years; also map some old-but-functional forwarding addresses to my canonical address. Update my GPG key fingerprint; I moved to 4096R a long time ago. Update description. Signed-off-by: Josh Triplett Cc: "Paul E. McKenney" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .mailmap | 5 +++++ CREDITS | 7 ++++--- MAINTAINERS | 2 +- kernel/rcu/rcutorture.c | 4 ++-- 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/.mailmap b/.mailmap index df1baba43a64..1ad68731fb47 100644 --- a/.mailmap +++ b/.mailmap @@ -62,6 +62,11 @@ Jeff Garzik Jens Axboe Jens Osterkamp John Stultz + + + + + Juha Yrjola Juha Yrjola Juha Yrjola diff --git a/CREDITS b/CREDITS index 28ee1514b9de..a80b66718f66 100644 --- a/CREDITS +++ b/CREDITS @@ -3511,10 +3511,11 @@ S: MacGregor A.C.T 2615 S: Australia N: Josh Triplett -E: josh@freedesktop.org -P: 1024D/D0FE7AFB B24A 65C9 1D71 2AC2 DE87 CA26 189B 9946 D0FE 7AFB -D: rcutorture maintainer +E: josh@joshtriplett.org +P: 4096R/8AFF873D 758E 5042 E397 4BA3 3A9C 1E67 0ED9 A3DF 8AFF 873D +D: RCU and rcutorture D: lock annotations, finding and fixing lock bugs +D: kernel tinification N: Winfried Trümper E: winni@xpilot.org diff --git a/MAINTAINERS b/MAINTAINERS index 86efa7e213c2..95990dd2678c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7424,7 +7424,7 @@ S: Orphan F: drivers/net/wireless/ray* RCUTORTURE MODULE -M: Josh Triplett +M: Josh Triplett M: "Paul E. McKenney" L: linux-kernel@vger.kernel.org S: Supported diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 7fa34f86e5ba..948a7693748e 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -18,7 +18,7 @@ * Copyright (C) IBM Corporation, 2005, 2006 * * Authors: Paul E. McKenney - * Josh Triplett + * Josh Triplett * * See also: Documentation/RCU/torture.txt */ @@ -51,7 +51,7 @@ #include MODULE_LICENSE("GPL"); -MODULE_AUTHOR("Paul E. McKenney and Josh Triplett "); +MODULE_AUTHOR("Paul E. McKenney and Josh Triplett "); torture_param(int, fqs_duration, 0, From 3a1122d26c62d4e8c61ef9a0eaba6e21c0862c77 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Wed, 30 Jul 2014 19:05:55 -0700 Subject: [PATCH 445/455] kexec: fix build error when hugetlbfs is disabled free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no need to filter PageHuge() page is such a configuration either, so avoid exporting the symbol to fix a build error: In file included from kernel/kexec.c:14:0: kernel/kexec.c: In function 'crash_save_vmcoreinfo_init': kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function) VMCOREINFO_SYMBOL(free_huge_page); ^ Introduced by commit 8f1d26d0e59b ("kexec: export free_huge_page to VMCOREINFO") Reported-by: kbuild test robot Acked-by: Olof Johansson Cc: Atsushi Kumagai Cc: Baoquan He Cc: Vivek Goyal Cc: Andrew Morton Signed-off-by: David Rientjes Signed-off-by: Linus Torvalds --- kernel/kexec.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/kexec.c b/kernel/kexec.c index 23a088fec3c0..4b8f0c925884 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1620,7 +1620,9 @@ static int __init crash_save_vmcoreinfo_init(void) #endif VMCOREINFO_NUMBER(PG_head_mask); VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE); +#ifdef CONFIG_HUGETLBFS VMCOREINFO_SYMBOL(free_huge_page); +#endif arch_crash_save_vmcoreinfo(); update_vmcoreinfo_note(); From 4c63f83c2c2e16a13ce274ee678e28246bd33645 Mon Sep 17 00:00:00 2001 From: Milan Broz Date: Tue, 29 Jul 2014 18:41:09 +0000 Subject: [PATCH 446/455] crypto: af_alg - properly label AF_ALG socket Th AF_ALG socket was missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This was recently demonstrated in the cryptsetup package (cryptsetup v1.6.5 and later.) See https://bugzilla.redhat.com/show_bug.cgi?id=1115120 This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Cc: stable@vger.kernel.org Signed-off-by: Milan Broz Acked-by: Paul Moore Signed-off-by: Herbert Xu --- crypto/af_alg.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 966f893711b3..6a3ad8011585 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -21,6 +21,7 @@ #include #include #include +#include struct alg_type_list { const struct af_alg_type *type; @@ -243,6 +244,7 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) sock_init_data(newsock, sk2); sock_graft(sk2, newsock); + security_sk_clone(sk, sk2); err = type->accept(ask->private, sk2); if (err) { From a74c52def9ab953c77956a8e93d225621980f54c Mon Sep 17 00:00:00 2001 From: Peter Ujfalusi Date: Wed, 2 Apr 2014 16:48:45 +0300 Subject: [PATCH 447/455] clk: ti: clk-7xx: Correct ABE DPLL configuration ABE DPLL frequency need to be lowered from 361267200 to 180633600 to facilitate the ATL requironments. The dpll_abe_m2x2_ck clock need to be set to double of ABE DPLL rate in order to have correct clocks for audio. Signed-off-by: Peter Ujfalusi Acked-by: Tero Kristo Signed-off-by: Mike Turquette --- drivers/clk/ti/clk-7xx.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/clk/ti/clk-7xx.c b/drivers/clk/ti/clk-7xx.c index e1581335937d..cb8e6f14e880 100644 --- a/drivers/clk/ti/clk-7xx.c +++ b/drivers/clk/ti/clk-7xx.c @@ -16,7 +16,7 @@ #include #include -#define DRA7_DPLL_ABE_DEFFREQ 361267200 +#define DRA7_DPLL_ABE_DEFFREQ 180633600 #define DRA7_DPLL_GMAC_DEFFREQ 1000000000 @@ -322,6 +322,11 @@ int __init dra7xx_dt_clk_init(void) if (rc) pr_err("%s: failed to configure ABE DPLL!\n", __func__); + dpll_ck = clk_get_sys(NULL, "dpll_abe_m2x2_ck"); + rc = clk_set_rate(dpll_ck, DRA7_DPLL_ABE_DEFFREQ * 2); + if (rc) + pr_err("%s: failed to configure ABE DPLL m2x2!\n", __func__); + dpll_ck = clk_get_sys(NULL, "dpll_gmac_ck"); rc = clk_set_rate(dpll_ck, DRA7_DPLL_GMAC_DEFFREQ); if (rc) From af436472772d474acfad724ce789e003d1ca41c3 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 30 Jul 2014 07:18:48 -0400 Subject: [PATCH 448/455] direct-io: fix AIO regression The direct-io.c rewrite to use the iov_iter infrastructure stopped updating the size field in struct dio_submit, and thus rendered the check for allowing asynchronous completions to always return false. Fix this by comparing it to the count of bytes in the iov_iter instead. Signed-off-by: Christoph Hellwig Reported-by: Tim Chen Tested-by: Tim Chen --- fs/direct-io.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 194d0d122cae..17e39b047de5 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -71,7 +71,6 @@ struct dio_submit { been performed at the start of a write */ int pages_in_io; /* approximate total IO pages */ - size_t size; /* total request size (doesn't change)*/ sector_t block_in_file; /* Current offset into the underlying file in dio_block units. */ unsigned blocks_available; /* At block_in_file. changes */ @@ -1104,7 +1103,8 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, unsigned blkbits = i_blkbits; unsigned blocksize_mask = (1 << blkbits) - 1; ssize_t retval = -EINVAL; - loff_t end = offset + iov_iter_count(iter); + size_t count = iov_iter_count(iter); + loff_t end = offset + count; struct dio *dio; struct dio_submit sdio = { 0, }; struct buffer_head map_bh = { 0, }; @@ -1287,10 +1287,9 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, */ BUG_ON(retval == -EIOCBQUEUED); if (dio->is_async && retval == 0 && dio->result && - ((rw == READ) || (dio->result == sdio.size))) + (rw == READ || dio->result == count)) retval = -EIOCBQUEUED; - - if (retval != -EIOCBQUEUED) + else dio_await_completion(dio); if (drop_refcount(dio) == 0) { From 6d2b6170c8914c6c69256b687651fb16d7ec3e18 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 24 Jun 2014 23:45:08 -0500 Subject: [PATCH 449/455] vfs: fix check for fallocate on active swapfile Fix the broken check for calling sys_fallocate() on an active swapfile, introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate operation on active swapfile"). Signed-off-by: Eric Biggers Signed-off-by: Al Viro --- fs/open.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/open.c b/fs/open.c index 36662d036237..d6fd3acde134 100644 --- a/fs/open.c +++ b/fs/open.c @@ -263,11 +263,10 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len) return -EPERM; /* - * We can not allow to do any fallocate operation on an active - * swapfile + * We cannot allow any fallocate operation on an active swapfile */ if (IS_SWAPFILE(inode)) - ret = -ETXTBSY; + return -ETXTBSY; /* * Revalidate the write permissions, in case security policy has From 504d58745c9ca28d33572e2d8a9990b43e06075d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 1 Aug 2014 12:20:02 +0200 Subject: [PATCH 450/455] timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04b #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: Fengguang Wu Signed-off-by: Jan Kara Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner --- kernel/time/clockevents.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index ad362c260ef4..9c94c19f1305 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -146,7 +146,8 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev) { /* Nothing to do if we already reached the limit */ if (dev->min_delta_ns >= MIN_DELTA_LIMIT) { - printk(KERN_WARNING "CE: Reprogramming failure. Giving up\n"); + printk_deferred(KERN_WARNING + "CE: Reprogramming failure. Giving up\n"); dev->next_event.tv64 = KTIME_MAX; return -ETIME; } @@ -159,9 +160,10 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev) if (dev->min_delta_ns > MIN_DELTA_LIMIT) dev->min_delta_ns = MIN_DELTA_LIMIT; - printk(KERN_WARNING "CE: %s increased min_delta_ns to %llu nsec\n", - dev->name ? dev->name : "?", - (unsigned long long) dev->min_delta_ns); + printk_deferred(KERN_WARNING + "CE: %s increased min_delta_ns to %llu nsec\n", + dev->name ? dev->name : "?", + (unsigned long long) dev->min_delta_ns); return 0; } From d8c712ea471ce7a4fd1734ad2211adf8469ddddc Mon Sep 17 00:00:00 2001 From: Greg Thelen Date: Thu, 31 Jul 2014 09:07:19 -0700 Subject: [PATCH 451/455] dm bufio: fully initialize shrinker 1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: Greg Thelen Acked-by: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org # v3.12+ --- drivers/md/dm-bufio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 4e84095833db..d724459860d9 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1541,7 +1541,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign BUG_ON(block_size < 1 << SECTOR_SHIFT || (block_size & (block_size - 1))); - c = kmalloc(sizeof(*c), GFP_KERNEL); + c = kzalloc(sizeof(*c), GFP_KERNEL); if (!c) { r = -ENOMEM; goto bad_client; From 44fa816bb778edbab6b6ddaaf24908dd6295937e Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Fri, 1 Aug 2014 11:55:47 -0400 Subject: [PATCH 452/455] dm cache: fix race affecting dirty block count nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: Anssi Hannula Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org --- drivers/md/dm-cache-target.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 5f054c44b485..2c63326638b6 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -231,7 +231,7 @@ struct cache { /* * cache_size entries, dirty if set */ - dm_cblock_t nr_dirty; + atomic_t nr_dirty; unsigned long *dirty_bitset; /* @@ -492,7 +492,7 @@ static bool is_dirty(struct cache *cache, dm_cblock_t b) static void set_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cblock) { if (!test_and_set_bit(from_cblock(cblock), cache->dirty_bitset)) { - cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) + 1); + atomic_inc(&cache->nr_dirty); policy_set_dirty(cache->policy, oblock); } } @@ -501,8 +501,7 @@ static void clear_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cbl { if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) { policy_clear_dirty(cache->policy, oblock); - cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) - 1); - if (!from_cblock(cache->nr_dirty)) + if (atomic_dec_return(&cache->nr_dirty) == 0) dm_table_event(cache->ti->table); } } @@ -2269,7 +2268,7 @@ static int cache_create(struct cache_args *ca, struct cache **result) atomic_set(&cache->quiescing_ack, 0); r = -ENOMEM; - cache->nr_dirty = 0; + atomic_set(&cache->nr_dirty, 0); cache->dirty_bitset = alloc_bitset(from_cblock(cache->cache_size)); if (!cache->dirty_bitset) { *error = "could not allocate dirty bitset"; @@ -2808,7 +2807,7 @@ static void cache_status(struct dm_target *ti, status_type_t type, residency = policy_residency(cache->policy); - DMEMIT("%u %llu/%llu %u %llu/%llu %u %u %u %u %u %u %llu ", + DMEMIT("%u %llu/%llu %u %llu/%llu %u %u %u %u %u %u %lu ", (unsigned)(DM_CACHE_METADATA_BLOCK_SIZE >> SECTOR_SHIFT), (unsigned long long)(nr_blocks_metadata - nr_free_blocks_metadata), (unsigned long long)nr_blocks_metadata, @@ -2821,7 +2820,7 @@ static void cache_status(struct dm_target *ti, status_type_t type, (unsigned) atomic_read(&cache->stats.write_miss), (unsigned) atomic_read(&cache->stats.demotion), (unsigned) atomic_read(&cache->stats.promotion), - (unsigned long long) from_cblock(cache->nr_dirty)); + (unsigned long) atomic_read(&cache->nr_dirty)); if (writethrough_mode(&cache->features)) DMEMIT("1 writethrough "); From c5cc87fa8ddf971682fc5dd112e2dfacb11f30cd Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 29 Jul 2014 12:18:34 +0100 Subject: [PATCH 453/455] ARM: idmap: add identity mapping usage note Add a note about the usage of the identity mapping; we do not support accesses outside of the identity map region and kernel image while a CPU is using the identity map. This is because the identity mapping may overwrite vmalloc space, IO mappings, the vectors pages, etc. Acked-by: Will Deacon Signed-off-by: Russell King --- arch/arm/mm/idmap.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c index d7a0ee898d24..c447ec70e868 100644 --- a/arch/arm/mm/idmap.c +++ b/arch/arm/mm/idmap.c @@ -9,6 +9,11 @@ #include #include +/* + * Note: accesses outside of the kernel image and the identity map area + * are not supported on any CPU using the idmap tables as its current + * page tables. + */ pgd_t *idmap_pgd; phys_addr_t (*arch_virt_to_idmap) (unsigned long x); From 6bf755db4d5e7ccea61fb17727a183b9bd8945b1 Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Fri, 1 Aug 2014 18:14:06 +0100 Subject: [PATCH 454/455] ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn) should only be entered when a kgdb break instruction is executed from the kernel. Otherwise, if kgdb is enabled, a userspace program can cause the kernel to drop into the debugger by executing either KGDB_BREAKINST or KGDB_COMPILED_BREAK. Acked-by: Will Deacon Signed-off-by: Omar Sandoval Signed-off-by: Russell King --- arch/arm/kernel/kgdb.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c index 778c2f7024ff..a74b53c1b7df 100644 --- a/arch/arm/kernel/kgdb.c +++ b/arch/arm/kernel/kgdb.c @@ -160,12 +160,16 @@ static int kgdb_compiled_brk_fn(struct pt_regs *regs, unsigned int instr) static struct undef_hook kgdb_brkpt_hook = { .instr_mask = 0xffffffff, .instr_val = KGDB_BREAKINST, + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, .fn = kgdb_brk_fn }; static struct undef_hook kgdb_compiled_brkpt_hook = { .instr_mask = 0xffffffff, .instr_val = KGDB_COMPILED_BREAK, + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, .fn = kgdb_compiled_brk_fn }; From 19583ca584d6f574384e17fe7613dfaeadcdc4a6 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 3 Aug 2014 15:25:02 -0700 Subject: [PATCH 455/455] Linux 3.16 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index f6a7794e4db4..d0901b46b4bf 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 16 SUBLEVEL = 0 -EXTRAVERSION = -rc7 +EXTRAVERSION = NAME = Shuffling Zombie Juror # *DOCUMENTATION*