1
0
Fork 0
Commit Graph

1590 Commits (redonkable)

Author SHA1 Message Date
Andrey Zhizhikin 8c8c2d4715 This is the 5.4.86 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl/sW9MACgkQONu9yGCS
 aT5SwBAAo6dgHqwmPfuf98/8oVeVqTxcmE7GpzpVRH2+yI7Zwk2ez29tAflcM7lT
 LKtR2WFGAxoCL4DUKXeO7Ubwpue5NoBIsJ8/dAYBesojps3WDaFGL55PvJLWwFJ7
 5gPtPzynITaqIC1JCFcrJ7OTp7REiCUZRc1CJXJINWAYL1VbEbH8pH904xfFcivy
 XnNyL9UiWp1lSB8oF3CRJOaK5M5gY1+wdCFaLVqQn306XDEM8PvZK4G3at/jXWgH
 jQjArdtC8M8NwjyTwtqW9JAMV+6CD0/HXk0QboTZg6yiaRrtUsfzMqJ1cvhKcQgO
 kLE3rwdnr3/MxuzSnGWbswflG2WCutoah58g0uN8H0nCiui5mKN6x5K+emgDZIoO
 ndDnh+/5OE247EK+3CGn/0N8i/fOymrLAnLL4wCXVdlQLMCalnL37ibdfGbAptXi
 N3GOGZ2iEglvTsEr5w0r86+AzNskm5EqA7mFGFiAyf9viR2xwYk3RrWf2ZyMRos2
 2S7mKcZmw7voDu2TIDIhqydToBKxmYI/mUn3mFFme1h3lwzM3zYG1aovVLfd5NkY
 Gx5E/CA/ut/3n0u/dXJ8SxEitBWkqImp5UdYcElQNxQoXnVU4yKmjf6dDL9Wqh+1
 ujCiaCUJd3PY0uXXIb6RWWGs2VaL4xiEnk+ZBm0VI9WEUWksSx0=
 =jnmv
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/yR4IACgkQ7G51OISz
 Hs2KfA//e3jp0Ah8krkhVUDaiKJ6cWwt3PyEyEuASV+KB7bf8dW25E98jEriBTBX
 kEgmy8ZWJqcDvxP7JWshmjSu6y7IfzFhrNMt3Fsd3ZsQz7nUprzokufnUSMcoNeu
 vXYtJHslunsxnUVOzy/iCAjb0KU7zS5lYxxITAGth1vgEM5QySXcBZx8yWrGxNt+
 Hk5Rc4hQlogmh4Mi1t9VoHOafy5smitOwVGtcl8oPiDCkoConXtBvNQgFkncBZWf
 0EOXiulRkWeo/KXMVrdVy8J1IzWjQDDM1/JDY/Xx6scLnBBCJ002Yfv/HpL/toAM
 K5/dYJmRktlsCaKFd14uMTAnEqhjnDyPtxntOa0Z4YExfOGwum/SmOMvQyCGLOJg
 eF5HejriqRfK2bRBpYO92aXdwmBSuu2cS3AXroHw3tl3VQ+9uzTNzp9iIonsKjSJ
 5WQPc+0Ebg5NPtHHkimeUTFcxmYfqOFV2u+wVDi9Lcfm7xzJ8zM7w/IyM6sMpdoZ
 xKQ0jto8KN0eQKWmih2GL/pde2iihjOPa6RYuRom7LCLMgjJvBMCIQJwUZBg1PU5
 0eh7WipmOt1xCfKWi3HX3A+2ibkdT/3CkochK26Md/mrCH4aI4JSacr3lXYTALhi
 yy37o4ogWgCwDGOmxCPfG2BxZzLywNIBJb3qwT3maDPMkMT0efs=
 =IdA4
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.86' into 5.4-2.2.x-imx

This is the 5.4.86 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-03 22:38:54 +00:00
Borislav Petkov 066d115fdd EDAC/amd64: Fix PCI component registration
commit 706657b1fe upstream.

In order to setup its PCI component, the driver needs any node private
instance in order to get a reference to the PCI device and hand that
into edac_pci_create_generic_ctl(). For convenience, it uses the 0th
memory controller descriptor under the assumption that if any, the 0th
will be always present.

However, this assumption goes wrong when the 0th node doesn't have
memory and the driver doesn't initialize an instance for it:

  EDAC amd64: F17h detected (node 0).
  ...
  EDAC amd64: Node 0: No DIMMs detected.

But looking up node instances is not really needed - all one needs is
the pointer to the proper device which gets discovered during instance
init.

So stash that pointer into a variable and use it when setting up the
EDAC PCI component.

Clear that variable when the driver needs to unwind due to some
instances failing init to avoid any registration imbalance.

Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:51:36 +01:00
Qiuxu Zhuo f4ce4a53c4 EDAC/i10nm: Use readl() to access MMIO registers
commit 83ff51c4e3 upstream.

Instead of raw access, use readl() to access MMIO registers of
memory controller to avoid possible compiler re-ordering.

Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Cc: <stable@vger.kernel.org>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:51:36 +01:00
Yazen Ghannam 3e08a61b2f EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
[ Upstream commit 8de0c9917c ]

The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
later systems. This function is used in amd64_edac_mod to do
system-specific decoding for DRAM ECC errors. The function takes a
"NodeId" as a parameter.

In AMD documentation, NodeId is used to identify a physical die in a
system. This can be used to identify a node in the AMD_NB code and also
it is used with umc_normaddr_to_sysaddr().

However, the input used for decode_dram_ecc() is currently the NUMA node
of a logical CPU. In the default configuration, the NUMA node and
physical die will be equivalent, so this doesn't have an impact.

But the NUMA node configuration can be adjusted with optional memory
interleaving modes. This will cause the NUMA node enumeration to not
match the physical die enumeration. The mismatch will cause the address
translation function to fail or report incorrect results.

Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the
physical ID is used.

Fixes: fbe63acf62 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:51:10 +01:00
Andrey Zhizhikin 4c7342a1d4 This is the 5.4.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+ahE8ACgkQONu9yGCS
 aT4j1A/9HzkKKoqZ2vXYQ1/uEnUqZech9ly1KxpNTBrSZYAtx3MaWY7tGDEx2BqD
 y6iw9x4MymhHEbpwLg6YmmdWuMQLNNYJGoyLiPJgWhkE4c7zHadhNz1DcPEI8F7z
 bSlUJ3Oebr8gzv0FvUmeVXw7Z2EuOqM1zGgTAZfnKY3DkYHbLnrzUJ4AiI8TNeba
 pPIhjfIJ1TvhF+s5ggf2m8OtSWLZ0doCWCPmCFe2WyERX2WYCzPgsm0yL7L7oXME
 ZqWpOcClBsiYekBNcZ4kxozhJtArCnv24n9VoXJ/YJIlWKvCA6uC8r527nGN/z08
 dfFelj1nDs7/VrCSP4+109EjxLQnSYGgIWP0g0OsC+9wOmrQsYJ1azP1eNjm+NuC
 hPa8uYVEZxwVyJuEfu4ZB4NMZBlD2qnHoskvBKbyZ8yaVnbvlMp552XMwsmJBpCs
 8wArzabrJEz396LUUIYG829D7NBDuRav1Miu+FTzlbn+xZ/Y/S8OmhoG2stWa4wV
 y5x0M0DWgrqiZ9rMkz9A03UNnCInQVTfIBoMl63xFitW4/0vLsln3+CjzlKm7H46
 rD/tKACUoCDjR5DN+JwQzmTdL9zBb4p1cXwWjWb6rON3BkXmO0JVAxzurxI9PfX0
 ZWDydZ3HNmrm0d3J12zf3kTX56PfPFAGWUsEc4Ntb5zdWXSQJsE=
 =fZ3T
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl+bPfIACgkQ7G51OISz
 Hs1LaA/9GOp34DrKBM/z7eN7gcbU4rJUhnggZ3vCXShRs3vtjEJ7wptzJb7lErXX
 6JCS/OjZpQpZJcHdBX0Kxovf8LVgvDrsbAhRhQkdFr0dESQQ4UY+vT5me9Y9Ot1F
 biG2z2HduLAxBgrYB2uA7VeRqlLiAa7ELS5EWB90xjY49w7gy5kK7AFrBRQdme9G
 r4fY89RD9sJVzo4sxgQwUYXuNJi5OmwbN+wrkwk8HXyL0tAB9SNQJ7A962Gxamao
 AIXT0CvNpNSkR/4JeqDXbJu54fMZxaF4A7a9mgL42fWe45jQs2zYSNx3vdZskzK8
 8z+4FCmShNkGMMLV5k6Ds/lJ8uF1yOkUBJJeiHJxnpZw93xKVWZfOVmgm+WO/mNq
 POmJVfALFFzwvNllyMX8D++0yhORunzfhzQyKgVAthwmScGQ5TK0cerwAa9VEz1T
 40e7AqsNKUxRPnoZYQwM0Y2Vskn6qZ8pOW3rSSQx2YI2lhQHGeUMAugHzYivERkF
 8d5hDPaQgfJXmS+S8Xp45zafeMDjoNQFQhZLAptmoF9+NGXdJduPSPJxQ0R/29GT
 2LEHNsneGotslpXwluk8x2VlXryf/7okEdR7RLq9kjEyM5BGOc2wcZrD6GXvnAf0
 JorbYZPriCaNHrGDdEiRFZmRlKaiR4CbcR3JtYkFJebirmIc1N0=
 =8w9z
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.73' into 5.4-2.2.x-imx

This is the 5.4.73 stable release

Conflicts:
- arch/arm/boot/dts/imx6sl.dtsi:
Commit [a1767c9019] in NXP tree is now covered with commit [5c4c2f437c]
from upstream.

- drivers/gpu/drm/mxsfb/mxsfb_drv.c:
Resolve merge hunk for patch [ed8b90d303] from upstream

- drivers/media/i2c/ov5640.c:
Patch [aa4bb8b883] in NXP tree is now covered by patches [79ec0578c7]
and [b2f8546056] from upstream. Changes from NXP patch [99aa4c8c18] are
covered in upstream version as well.

- drivers/net/ethernet/freescale/fec_main.c:
Fix merge fuzz for patch [9e70485b40] from upstream.

- drivers/usb/cdns3/gadget.c:
Keep NXP version of the file, upstream version is not compatible.

- drivers/usb/dwc3/core.c:
- drivers/usb/dwc3/core.h:
Fix merge fuzz of patch [08045050c6] together wth NXP patch [b30e41dc1e]

- sound/soc/fsl/fsl_sai.c:
- sound/soc/fsl/fsl_sai.h:
Commit [2ea70e51eb72a] in NXP tree is now covered with commit [1ad7f52fe6]
from upstream.

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2020-10-29 22:09:24 +00:00
Krzysztof Kozlowski 9aee821655 EDAC/ti: Fix handling of platform_get_irq() error
[ Upstream commit 66077adb70 ]

platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.

 [ bp: Massage commit message. ]

Fixes: 86a18ee21e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tero Kristo <t-kristo@ti.com>
Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:57:29 +01:00
Krzysztof Kozlowski 64a9f5a30f EDAC/aspeed: Fix handling of platform_get_irq() error
[ Upstream commit afce699694 ]

platform_get_irq() returns a negative error number on error. In such a
case, comparison to 0 would pass the check therefore check the return
value properly, whether it is negative.

 [ bp: Massage commit message. ]

Fixes: 9b7e6242ee ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Stefan Schaeckeler <schaecsn@gmx.net>
Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:57:29 +01:00
Dinghao Liu 4d86328e42 EDAC/i5100: Fix error handling order in i5100_init_one()
[ Upstream commit 857a3139bd ]

When pci_get_device_func() fails, the driver doesn't need to execute
pci_dev_put(). mci should still be freed, though, to prevent a memory
leak. When pci_enable_device() fails, the error injection PCI device
"einj" doesn't need to be disabled either.

 [ bp: Massage commit message, rename label to "bail_mc_free". ]

Fixes: 52608ba205 ("i5100_edac: probe for device 19 function 0")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:57:29 +01:00
Andrey Zhizhikin e0de7af107 This is the 5.4.69 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl91u0cACgkQONu9yGCS
 aT7KmhAAvuW3edfAfzD/F5h4vHaa9rMRmtvp2/FwefBoE4LEi3F6p2gBrUZMA3ds
 DNQ8Nheafeqd63wFkfE//TXYR0rYTxTxa0jTrhtuJCUZ4+anRyG00fEbHPOxvMnJ
 aPwQQVNOfCaUAvRbFdQ4RbuIm5chhX8Bml0ZtqvsAAFJ9XkCh1UPF0VHtSrS7PRL
 lRMBlamLgZqU72naaJaFY2nMp+pvMFPZrzkR7tpv0Z1bqxuJp6L2n/EmcHpmTOJy
 Ze+Wvt1wKk8Ep5Vql5ekXt5lEiInjacwsJZXbb5HfHO++Y+1b+ABt1kSjJx+R3/q
 2Qdztq+9Eoj0N1A4gXdVFoZHqKihhbD49k8YqX4qO5ujTzqgnNyHGSEXyIKvaU6z
 b3b12IvjbcMhM1zm3qvFfrVbbQI3kJf66zSi9NAwsZHlsvxRzslALR8I7mila4r5
 fVOyfGoZxFs44FNW9JG7I85/isAxgg0ogYraMZbk8gmhTtb1ZaN+r7kJeXuTpzOg
 UBAIDYPclMyZeny6tn1/qFuzNGYQQ0R9kxFcTC21Cf2zNLWHNfwCL1vE3Ob+ROIS
 IHcsce6IqWQKGlD8UPjkZiXTLfqCAVi51PsGTVrnidXfa1IBOuvDsVqlghPsjHSD
 30N4VB++9Gbw7LFEP4e33cOZLBLjDEdYd4VuoQFYywDZ3cy6xXo=
 =OoZD
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.69' into 5.4-2.2.x-imx

This is the 5.4.69 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2020-10-01 16:21:52 +00:00
Borislav Petkov b11f2d6b80 EDAC/ghes: Check whether the driver is on the safe list correctly
[ Upstream commit 251c54ea26 ]

With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe,
unregister and probe again a driver.

When ghes_edac is attempted to be loaded on a system which is not on
the safe platforms list, ghes_edac_register() would return early. The
unregister counterpart ghes_edac_unregister() would still attempt to
unregister and exit early at the refcount test, leading to the refcount
underflow below.

In order to not do *anything* on the unregister path too, reuse the
force_load parameter and check it on that path too, before fumbling with
the refcount.

  ghes_edac: ghes_edac_register: entry
  ghes_edac: ghes_edac_register: return -ENODEV
  ------------[ cut here ]------------
  refcount_t: underflow; use-after-free.
  WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100
  Modules linked in:
  CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12
  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
  RIP: 0010:refcount_warn_saturate+0xb9/0x100
  Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c
  RSP: 0018:ffffc90000037d58 EFLAGS: 00010292
  RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff
  RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000
  R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0
  Call Trace:
   ghes_edac_unregister
   ghes_remove
   platform_drv_remove
   really_probe
   driver_probe_device
   device_driver_attach
   __driver_attach
   ? device_driver_attach
   ? device_driver_attach
   bus_for_each_dev
   bus_add_driver
   driver_register
   ? bert_init
   ghes_init
   do_one_initcall
   ? rcu_read_lock_sched_held
   kernel_init_freeable
   ? rest_init
   kernel_init
   ret_from_fork
   ...
  ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200911164950.GB19320@zn.tnic
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:18:14 +02:00
Andrey Zhizhikin ee7b6ad15b This is the 5.4.67 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl9rJlYACgkQONu9yGCS
 aT6WbRAAga6QVKrO6R4NeKk0fPqKQQoQeTK+phBOFA7jAoX/rIRKyob2Si9BwhBA
 F77vZ6HIZ7+e/o35JJQYQbffbHYs0ANuS1oHGqe0vgbh+72Viaan6g7lFOhpx8qf
 y0YS7q+hw4WLZB0gGlBM7nkPXRiis32IrEVabQW+t8hmT2lWyutY8E2yFAU60tvI
 Tvjm2c2pvHEcHz9MrjEd/jIVxMFnIl42FBTx9bGsbDUCDzBwEvPArS4bNioP7EFJ
 O+rrGCNvwtiv0DuKzX1UIZzQ88IROmU3ZjsIlgOwla7xJWv4QDgmPfyAyRI48QhH
 PAZQmSntz+y+MP6B3z3ZBrxc2Fx0kCDtugn2P9+2RVUEpheANJ293vUgYTKN9Roy
 dHdWHFWNTO9IYpIN0cZjc25db4ULdjerWQrKcCr6ZO8+Ep/0mSzx3lkWjfuUP8Hr
 L2RD6rAm259OpPq8xhAcJpJvoQLwGxaBHyr4QYUmRgmNVURoqe9Q0MTZuiyGsXhm
 rtcNky9WvmyyI1lJgXi4A+vmsIThCHEstEMycgTejfJ4itIVA9e1ctJVVomWULCn
 9oNStBJpmHw0myDCohbKNjeO1UX/erdF9NaoGto5bnfIhcSae1YQEjRB8zKmzbg1
 DpgC1f7IZ7q53vfrDGsAjInOcuEwAn/Y5JMLJOL4mdA9j3XlX2o=
 =Ot99
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.67' into 5.4-2.2.x-imx

This is the 5.4.67 stable release

This updates the kernel present in the NXP release imx_5.4.47_2.2.0 to the
latest patchset available from stable korg.

Base stable kernel version present in the NXP BSP release is v5.4.47.

Following conflicts were recorded and resolved:
- arch/arm/mach-imx/pm-imx6.c
NXP version has a different PM vectoring scheme, where the IRAM bottom
half (8k) is used to store IRAM code and pm_info. Keep this version to
be compatible with NXP PM implementation.

- arch/arm64/boot/dts/freescale/imx8mm-evk.dts
- arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
NXP patches kept to provide proper LDO setup:
imx8mm-evk.dts: 975d8ab07267ded741c4c5d7500e524c85ab40d3
imx8mn-ddr4-evk.dts: e8e35fd0e759965809f3dca5979a908a09286198

- drivers/crypto/caam/caamalg.c
Keep NXP version, as it already covers the functionality for the
upstream patch [d6bbd4eea2]

- drivers/gpu/drm/imx/dw_hdmi-imx.c
- drivers/gpu/drm/imx/imx-ldb.c
- drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
Port changes from upstream commit [1a27987101], which extends
component lifetime by moving drm structures allocation/free from
bind() to probe().

- drivers/gpu/drm/imx/imx-ldb.c
Merge patch [1752ab50e8] from upstream to disable both LVDS channels
when Enoder is disabled

- drivers/mmc/host/sdhci-esdhc-imx.c
Fix merge fuzz produced by [6534c897fd].

- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
Commit d1a00c9bb1 from upstream solves the issue with improper error
reporting when qdisc type support is absent. Upstream version is merged
into NXP implementation.

- drivers/net/ethernet/freescale/enetc/enetc.c
Commit [ce06fcb6a6] from upstream merged,
base NXP version kept

- drivers/net/ethernet/freescale/enetc/enetc_pf.c
Commit [e8b86b4d87] from upstream solves
the kernel panic in case if probing fails. NXP has a clean-up logic
implemented different, where the MDIO remove would be invoked in any
failure case. Keep the NXP logic in place.

- drivers/thermal/imx_thermal.c
Upstream patch [9025a5589c] adds missing
of_node_put call, NXP version has been adapted to accommodate this patch
into the code.

- drivers/usb/cdns3/ep0.c
Manual merge of commit [be8df02707] from
upstream to protect cdns3_check_new_setup

- drivers/xen/swiotlb-xen.c
Port upstream commit cca58a1669 to NXP tree, manual hunk was
resolved during merge.

- sound/soc/fsl/fsl_esai.c
Commit [53057bd4ac] upstream addresses the problem of endless isr in
case if exception interrupt is enabled and tasklet is scheduled. Since
NXP implementation has tasklet removed with commit [2bbe95fe6c],
upstream fix does not match the main implementation, hence we keep the
NXP version here.

- sound/soc/fsl/fsl_sai.c
Apply patch [b8ae2bf5cc] from upstream, which uses FIFO watermark
mask macro.

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2020-09-26 20:54:42 +00:00
Tony Luck 6d11320bed EDAC/{i7core,sb,pnd2,skx}: Fix error event severity
[ Upstream commit 45bc6098a3 ]

IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto
the stack as part of machine check delivery is valid or not.

Various drivers copied a code fragment that uses the RIPV bit to
determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED
or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where
RIPV is set as "FATAL").

Reverse the tests so that the error is marked fatal when RIPV is not set.

Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03 11:26:53 +02:00
Mauro Carvalho Chehab 87cc96bb11 EDAC: skx_common: get rid of unused type var
[ Upstream commit f05390d30e ]

	drivers/edac/skx_common.c: In function ‘skx_mce_output_error’:
	drivers/edac/skx_common.c:478:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
	  478 |  char *type, *optype;
	      |        ^~~~

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03 11:26:52 +02:00
Mauro Carvalho Chehab 3bf42b2e8d EDAC: sb_edac: get rid of unused vars
[ Upstream commit 323014d85d ]

There are several vars unused on this driver, probably because
it was a modified copy of another driver. Get rid of them.

	drivers/edac/sb_edac.c: In function ‘knl_get_dimm_capacity’:
	drivers/edac/sb_edac.c:1343:16: warning: variable ‘sad_size’ set but not used [-Wunused-but-set-variable]
	 1343 |  u64 sad_base, sad_size, sad_limit = 0;
	      |                ^~~~~~~~
	drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
	drivers/edac/sb_edac.c:2955:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
	 2955 |  char *type, *optype, msg[256];
	      |        ^~~~
	drivers/edac/sb_edac.c: In function ‘sbridge_unregister_mci’:
	drivers/edac/sb_edac.c:3203:22: warning: variable ‘pvt’ set but not used [-Wunused-but-set-variable]
	 3203 |  struct sbridge_pvt *pvt;
	      |                      ^~~
	At top level:
	drivers/edac/sb_edac.c:266:18: warning: ‘correrrthrsld’ defined but not used [-Wunused-const-variable=]
	  266 | static const u32 correrrthrsld[] = {
	      |                  ^~~~~~~~~~~~~
	drivers/edac/sb_edac.c:257:18: warning: ‘correrrcnt’ defined but not used [-Wunused-const-variable=]
	  257 | static const u32 correrrcnt[] = {
	      |                  ^~~~~~~~~~

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03 11:26:52 +02:00
Jason Baron c67c6e1f54 EDAC/ie31200: Fallback if host bridge device is already initialized
[ Upstream commit 709ed1bcef ]

The Intel uncore driver may claim some of the pci ids from ie31200 which
means that the ie31200 edac driver will not initialize them as part of
pci_register_driver().

Let's add a fallback for this case to 'pci_get_device()' to get a
reference on the device such that it can still be configured. This is
similar in approach to other edac drivers.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/1594923911-10885-1-git-send-email-jbaron@akamai.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03 11:26:48 +02:00
Qiushi Wu c73eec04e6 EDAC: Fix reference count leaks
[ Upstream commit 17ed808ad2 ]

When kobject_init_and_add() returns an error, it should be handled
because kobject_init_and_add() takes a reference even when it fails. If
this function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.

Therefore, replace calling kfree() and call kobject_put() and add a
missing kobject_put() in the edac_device_register_sysfs_main_kobj()
error path.

 [ bp: Massage and merge into a single patch. ]

Fixes: b2ed215a33 ("Kobject: change drivers/edac to use kobject_init_and_add")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu
Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19 08:15:54 +02:00
Borislav Petkov 58ab86e58b EDAC/amd64: Read back the scrub rate PCI register on F15h
[ Upstream commit ee470bb25d ]

Commit:

  da92110dfd ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")

added support for F15h, model 0x60 CPUs but in doing so, missed to read
back SCRCTRL PCI config register on F15h CPUs which are *not* model
0x60. Add that read so that doing

  $ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate

can show the previously set DRAM scrub rate.

Fixes: da92110dfd ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h")
Reported-by: Anders Andersson <pipatron@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> #v4.4..
Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09 09:37:49 +02:00
Alexander Monakov 7c71b9aa18 EDAC/amd64: Add AMD family 17h model 60h PCI IDs
commit b6bea24d41 upstream.

Add support for AMD Renoir (4000-series Ryzen CPUs).

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20200510204842.2603-4-amonakov@ispras.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22 09:31:20 +02:00
Jason Liu 5691e22711 Merge tag 'v5.4.47' into imx_5.4.y
* tag 'v5.4.47': (2193 commits)
  Linux 5.4.47
  KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
  KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
  ...

 Conflicts:
	arch/arm/boot/dts/imx6qdl.dtsi
	arch/arm/mach-imx/Kconfig
	arch/arm/mach-imx/common.h
	arch/arm/mach-imx/suspend-imx6.S
	arch/arm64/boot/dts/freescale/imx8qxp-mek.dts
	arch/powerpc/include/asm/cacheflush.h
	drivers/cpufreq/imx6q-cpufreq.c
	drivers/dma/imx-sdma.c
	drivers/edac/synopsys_edac.c
	drivers/firmware/imx/imx-scu.c
	drivers/net/ethernet/freescale/fec.h
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	drivers/net/phy/phy_device.c
	drivers/perf/fsl_imx8_ddr_perf.c
	drivers/usb/cdns3/gadget.c
	drivers/usb/dwc3/gadget.c
	include/uapi/linux/dma-buf.h

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>
2020-06-19 17:32:49 +08:00
Qiuxu Zhuo 78e6964dce EDAC/skx: Use the mcmtr register to retrieve close_pg/bank_xor_enable
commit 1032095053 upstream.

The skx_edac driver wrongly uses the mtr register to retrieve two fields
close_pg and bank_xor_enable. Fix it by using the correct mcmtr register
to get the two fields.

Cc: <stable@vger.kernel.org>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reported-by: Matthew Riley <mattdr@google.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20200515210146.1337-1-tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-17 16:40:33 +02:00
Sherry Sun 90335d6681 EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
commit dfc6014e3b upstream.

handle_error() currently calls snprintf() a couple of times in
succession to output the message for a CE/UE, therefore overwriting each
part of the message which was formatted with the previous snprintf()
call. As a result, only the part of the message from the last snprintf()
call will be printed.

The simplest and most effective way to fix this problem is to combine
the whole string into one which to supply to a single snprintf() call.

 [ bp: Massage. ]

Fixes: b500b4a029 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller")
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Manish Narani <manish.narani@xilinx.com>
Link: https://lkml.kernel.org/r/1582792452-32575-1-git-send-email-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12 13:00:31 +01:00
Jason Liu 335d2828a9 This is the 5.4.24 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5hHjgACgkQONu9yGCS
 aT6CSBAA0c16mnDb59jgmW/sBj/p/MrlD/WJzLriqiKN5BUsPt9++I5mNj8mG+d2
 Glm4086e8L826zv8oKiZm23xk93on+78ExhVFVZvZNaEUpiRNYCGSuDq2NrHW0z+
 kpagkAFLfCUZFoKtmWo+bpl0YtF4dd/fg7+EjyL6qT1DBs8NVMwZx7i/v0xXv7Wc
 0vsGCLYoBLzcW1FB2d9cfAUPCBuGEzL/7TdifNOXRgI9owGsZndFJgXgIzoBUt/P
 tqB8RLjIupCiMEPtsEAZ/rgEQLPFkb3yrBvgjd1wDI8bHUIQU0clqThKVNvmNSmv
 UTBSNgPAhkP8nZG7X9xCkyfEsUefejBJy66da9n4XTGGrXf9ga0BL0nNrOGwOesr
 m+tNnBSFsbFCMqFopQnt4zZSnaf67AOk2mzxbEu4E+sStyW943aDO9MoRRFgaYGH
 pfie3qOKtKta2MuNTJA+q6F0W9H+V5MtMpwbyuy1/dp2eVln2wewBBMvXYdL1YOy
 E/Z87nsQgalsDynz9m/niv32J4JAxHptyOyROkktDLBSzL5RawNn+Op8X5EtmZOe
 sPkiYicqp9CLmMj13qWXJhtuyNdD4wk6FyyAy6cX9mF44+EZGOBkyNP+n8g789Kn
 sqFJ7sfTfOnwLBFciMA5PaMTGNWROyWXNkvvUzO+9t0CyFAnT2U=
 =abGA
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.24' into imx_5.4.y

Merge Linux stable release v5.4.24 into imx_5.4.y

* tag 'v5.4.24': (3306 commits)
  Linux 5.4.24
  blktrace: Protect q->blk_trace with RCU
  kvm: nVMX: VMWRITE checks unsupported field before read-only field
  ...

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>

 Conflicts:
	arch/arm/boot/dts/imx6sll-evk.dts
	arch/arm/boot/dts/imx7ulp.dtsi
	arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
	drivers/clk/imx/clk-composite-8m.c
	drivers/gpio/gpio-mxc.c
	drivers/irqchip/Kconfig
	drivers/mmc/host/sdhci-of-esdhc.c
	drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
	drivers/net/can/flexcan.c
	drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
	drivers/net/ethernet/mscc/ocelot.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	drivers/net/phy/realtek.c
	drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
	drivers/perf/fsl_imx8_ddr_perf.c
	drivers/tee/optee/shm_pool.c
	drivers/usb/cdns3/gadget.c
	kernel/sched/cpufreq.c
	net/core/xdp.c
	sound/soc/fsl/fsl_esai.c
	sound/soc/fsl/fsl_sai.c
	sound/soc/sof/core.c
	sound/soc/sof/imx/Kconfig
	sound/soc/sof/loader.c
2020-03-08 18:57:18 +08:00
Aristeu Rozanski 728afb955b EDAC: skx_common: downgrade message importance on missing PCI device
[ Upstream commit 854bb48018 ]

Both skx_edac and i10nm_edac drivers are loaded based on the matching CPU being
available which leads the module to be automatically loaded in virtual machines
as well. That will fail due the missing PCI devices. In both drivers the first
function to make use of the PCI devices is skx_get_hi_lo() will simply print

	EDAC skx: Can't get tolm/tohm

for each CPU core, which is noisy. This patch makes it a debug message.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20191204212325.c4k47p5hrnn3vpb5@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05 16:43:31 +01:00
Wei Yongjun d4870a4343 EDAC/sifive: Fix return value check in ecc_register()
[ Upstream commit 6cd18453b6 ]

In case of error, the function edac_device_alloc_ctl_info() returns a
NULL pointer, not ERR_PTR(). Replace the IS_ERR() test in the return
value check with a NULL test.

Fixes: 91abaeaaff ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200115150303.112627-1-weiyongjun1@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:36:51 +01:00
Robert Richter ce8b9b8032 EDAC/mc: Fix use-after-free and memleaks during device removal
commit 216aa145aa upstream.

A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:

1) Use-after-free:

On 27.11.19 17:07:33, John Garry wrote:
> [   22.104498] BUG: KASAN: use-after-free in
> edac_remove_sysfs_mci_device+0x148/0x180

The use-after-free is caused by the mci_for_each_dimm() macro called in
edac_remove_sysfs_mci_device(). The iterator was introduced with

  c498afaf7d ("EDAC: Introduce an mci_for_each_dimm() iterator").

The iterator loop calls device_unregister(&dimm->dev), which removes
the sysfs entry of the device, but also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
the dimm struct is accessed again, after having been freed already.

The fix is to free all the mci device's subsequent dimm and csrow
objects at a later point, in _edac_mc_free(), when the mci device itself
is being freed.

This keeps the data structures intact and the mci device can be
fully used until its removal. The change allows the safe usage of
mci_for_each_dimm() to release dimm devices from sysfs.

2) Memory leaks:

Following memory leaks have been detected:

 # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
       1     [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0      # mci->csrows
      16     [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0      # csr->channels
      16     [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0      # csr->channels[chn]
       1     [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0      # mci->dimms
      34     [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()

All leaks are from memory allocated by edac_mc_alloc().

Note: The test above shows that edac_mc_alloc() was called here from
ghes_edac_register(), thus both functions show up in the stack trace
but the module causing the leaks is edac_mc. The comments with the data
structures involved were made manually by analyzing the objdump.

The data structures listed above and created by edac_mc_alloc() are
not properly removed during device removal, which is done in
edac_mc_free().

There are two paths implemented to remove the device depending on device
registration, _edac_mc_free() is called if the device is not registered
and edac_unregister_sysfs() otherwise.

The implemenations differ. For the sysfs case, the mci device removal
lacks the removal of subsequent data structures (csrows, channels,
dimms). This causes the memory leaks (see mci_attr_release()).

 [ bp: Massage commit message. ]

Fixes: c498afaf7d ("EDAC: Introduce an mci_for_each_dimm() iterator")
Fixes: faa2ad09c0 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
Fixes: 7a623c0390 ("edac: rewrite the sysfs code to use struct device")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19 19:53:03 +01:00
Robert Richter b2e977a973 EDAC/sysfs: Remove csrow objects on errors
commit 4d59588c09 upstream.

All created csrow objects must be removed in the error path of
edac_create_csrow_objects(). The objects have been added as devices.

They need to be removed by doing a device_del() *and* put_device() call
to also free their memory. The missing put_device() leaves a memory
leak. Use device_unregister() instead of device_del() which properly
unregisters the device doing both.

Fixes: 7adc05d2dc ("EDAC/sysfs: Drop device references properly")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19 19:53:03 +01:00
Sherry Sun 2b75203928 MLK-23333-3 EDAC: synopsys: enable interrupt again for imx8mpevk
Since zynqmp_get_error_info() is called during imx8mpevk CE/UE
interrupt, at the end of zynqmp_get_error_info(), it disables the
interrupt of imx8mpevk, then the interrupt handler will be called only
once, so here enable interrupt again.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
2020-02-18 22:18:44 +08:00
Sherry Sun 97b100a74e MLK-23333-2 EDAC: synopsys: Add more useful output information for CE/UE
When CE/UE appears, the wrong data or wrong ECC data is usally useful,
so print these information out, and correct the snprintf format.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
2020-02-18 22:18:44 +08:00
Sherry Sun 6e16bd7f7c MLK-23310-3 EDAC: synopsys: Add edac driver support for i.MX8MP
Since i.MX8MP use synopsys ddr controller IP, so add edac support
for i.MX8MP based on synopsys edac driver. The difference between ZynqMP
and i.MX8MP ddr controller is interrupt registers. So add
intr_handler_imx8mp/enable_intr_imx8mp/disable_intr_imx8mp three
functions to distinguish with ZynqMP.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Reviewed-by: Frank Li <frank.li@nxp.com>
2020-02-13 14:34:00 +08:00
Sherry Sun 1191fb5146 MLK-23310-1 defconfig: EDAC: enable synopsys edac driver support for imx8mp
Enable edac driver on imx8mp based on synopsys edac driver.

Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Reviewed-by: Frank Li <frank.li@nxp.com>
2020-02-13 14:33:27 +08:00
Robert Richter f90edcff1e EDAC/ghes: Fix grain calculation
[ Upstream commit 7088e29e04 ]

The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:

	e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);

This is broken in several ways:

1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.

2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.

Fix this with:

	e->grain = ~mem_err->physical_addr_mask + 1;

The grain_bits are defined as:

	grain = 1 << grain_bits;

Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.

The value in ->physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding it up.

Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31 16:45:16 +01:00
Yazen Ghannam 36b4080a3f EDAC/amd64: Set grain per DIMM
[ Upstream commit 466503d6b1 ]

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31 16:44:20 +01:00
Robert Richter e240c7d1f1 EDAC/ghes: Do not warn when incrementing refcount on 0
[ Upstream commit 16214bd9e4 ]

The following warning from the refcount framework is seen during ghes
initialization:

  EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
  ------------[ cut here ]------------
  refcount_t: increment on 0; use-after-free.
  WARNING: CPU: 36 PID: 1 at lib/refcount.c:156 refcount_inc_checked
 [...]
  Call trace:
   refcount_inc_checked
   ghes_edac_register
   ghes_probe
   ...

It warns if the refcount is incremented from zero. This warning is
reasonable as a kernel object is typically created with a refcount of
one and freed once the refcount is zero. Afterwards the object would be
"used-after-free".

For GHES, the refcount is initialized with zero, and that is why this
message is seen when initializing the first instance. However, whenever
the refcount is zero, the device will be allocated and registered. Since
the ghes_reg_mutex protects the refcount and serializes allocation and
freeing of ghes devices, a use-after-free cannot happen here.

Instead of using refcount_inc() for the first instance, use
refcount_set(). This can be used here because the refcount is zero at
this point and can not change due to its protection by the mutex.

Fixes: 23f61b9fc5 ("EDAC/ghes: Fix locking and memory barrier issues")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Cc: <huangming23@huawei.com>
Cc: James Morse <james.morse@arm.com>
Cc: <linuxarm@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <tanxiaofei@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <wanghuiqiang@huawei.com>
Link: https://lkml.kernel.org/r/20191121213628.21244-1-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17 19:56:54 +01:00
Meng Li dc69bd2393 EDAC/altera: Use fast register IO for S10 IRQs
commit 56d9e7bd3f upstream.

When an IRQ occurs, regmap_{read,write,...}() is invoked in atomic
context. Regmap must indicate register IO is fast so that a spinlock is
used instead of a mutex to avoid sleeping in atomic context:

  lock_acquire
  __mutex_lock
  mutex_lock_nested
  regmap_lock_mutex
  regmap_write
  a10_eccmgr_irq_unmask
  unmask_irq.part.0
  irq_enable
  __irq_startup
  irq_startup
  __setup_irq
  request_threaded_irq
  devm_request_threaded_irq
  altr_sdram_probe

Mark it so.

 [ bp: Massage. ]

Fixes: 3dab6bd526 ("EDAC, altera: Add support for Stratix10 SDRAM EDAC")
Reported-by: Meng Li <Meng.Li@windriver.com>
Signed-off-by: Meng Li <Meng.Li@windriver.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: stable <stable@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1574361048-17572-2-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 19:55:51 +01:00
Robert Richter d2136afb51 EDAC/ghes: Fix locking and memory barrier issues
[ Upstream commit 23f61b9fc5 ]

The ghes registration and refcount is broken in several ways:

 * ghes_edac_register() returns with success for a 2nd instance
   even if a first instance's registration is still running. This is
   not correct as the first instance may fail later. A subsequent
   registration may not finish before the first. Parallel registrations
   must be avoided.

 * The refcount was increased even if a registration failed. This
   leads to stale counters preventing the device from being released.

 * The ghes refcount may not be decremented properly on unregistration.
   Always decrement the refcount once ghes_edac_unregister() is called to
   keep the refcount sane.

 * The ghes_pvt pointer is handed to the irq handler before registration
   finished.

 * The mci structure could be freed while the irq handler is running.

Fix this by adding a mutex to ghes_edac_register(). This mutex
serializes instances to register and unregister. The refcount is only
increased if the registration succeeded. This makes sure the refcount is
in a consistent state after registering or unregistering a device.

Note: A spinlock cannot be used here as the code section may sleep.

The ghes_pvt is protected by ghes_lock now. This ensures the pointer is
not updated before registration was finished or while the irq handler is
running. It is unset before unregistering the device including necessary
(implicit) memory barriers making the changes visible to other CPUs.
Thus, the device can not be used anymore by an interrupt.

Also, rename ghes_init to ghes_refcount for better readability and
switch to refcount API.

A refcount is needed because there can be multiple GHES structures being
defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
Source, "Some platforms may describe multiple Generic Hardware Error
Source structures with different notification types, ...").

Another approach to use the mci's device refcount (get_device()) and
have a release function does not work here. A release function will be
called only for device_release() with the last put_device() call. The
device must be deleted *before* that with device_del(). This is only
possible by maintaining an own refcount.

 [ bp: touchups. ]

Fixes: 0fe5f281f7 ("EDAC, ghes: Model a single, logical memory controller")
Fixes: 1e72e673b9 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
Co-developed-by: James Morse <james.morse@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-13 08:43:30 +01:00
James Morse 1e72e673b9 EDAC/ghes: Fix Use after free in ghes_edac remove path
ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.

ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.

When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.

But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.

This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.

Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.

ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.

This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.

This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.

 [ bp: merge into a single patch. ]

Fixes: 0fe5f281f7 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
2019-10-17 11:27:05 +02:00
Linus Torvalds 8808cf8cbc ARM updates for 5.4-rc1:
- fix various clang build and cppcheck issues
 - switch ARM to use new common outgoing-CPU-notification code
 - add some additional explanation about the boot code
 - kbuild "make clean" fixes
 - get rid of another "(____ptrval____)", this time for the VDSO code
 - avoid treating cache maintenance faults as a write
 - add a frame pointer unwinder implementation for clang
 - add EDAC support for Aurora L2 cache
 - improve robustness of adjust_lowmem_bounds() finding the bounds of
   lowmem.
 - add reset control for AMBA primecell devices
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAXYZCdvTnkBvkraxkAQK7vQ//UO0XJ1InSLnWzPYuNwJGcCmzHIg6p40A
 VxnvDTVxZH6UKDhBg8xx+gpPOhwZElGyc0H563p5jgmjzbIesESS5Xy3hUUMkQ9y
 A6Ta9Nk+NhL+j9O9VtcOk90oQJsLuVyYtHTfk6Wl9xaVLjM1OALWNzCSDqXIPTjF
 qEhTRahlv9Nc9aisFJAPduf/zQx9ULaZVvDzTo6clXSD7ieSy0MZRiRbcH3MJwiY
 Q5AbImF49NGcNtlknPh8Gnz/4P3q+bxQDmrzki9d4Fcy2brko845q9Ca5PC+iXro
 fZHvs8q2+8xz4PuOddBrYebqPIIv+3W6uPlJAPjO0MQrxPTUxRBxqAkYXxwTZBx/
 A79AQsbnmUSyOV4EI2lk9USmN/GF2QwGOusRoiA/XMbSVfqnVZWH5mE98dr+2vn+
 rUnTq9yvSz2y6QH7+UI+7Q7T8jg4QFBBmPDfCP+yTOWqPb8u070h+VgLBr28g1JL
 t6VtzOeI4wyl7E/WInmoM/d8SXnjv/1yNzLBcCdvgBV94fUQAV5EP+cDGJ0hv1SJ
 TGywm8adf3zAa7ZUAOhBoAK3gkNqjJB28ynsH4QmBUmsKkozxoKwwb4jjbGgcoUY
 rYII4VyoQB/0eX5/i8u69krA+3QNRhehLWC/zM4ZK5lKfFRCnNDvLgiIEM5b59JW
 nBywRtpyw2I=
 =Evmc
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

 - fix various clang build and cppcheck issues

 - switch ARM to use new common outgoing-CPU-notification code

 - add some additional explanation about the boot code

 - kbuild "make clean" fixes

 - get rid of another "(____ptrval____)", this time for the VDSO code

 - avoid treating cache maintenance faults as a write

 - add a frame pointer unwinder implementation for clang

 - add EDAC support for Aurora L2 cache

 - improve robustness of adjust_lowmem_bounds() finding the bounds of
   lowmem.

 - add reset control for AMBA primecell devices

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (24 commits)
  ARM: 8906/1: drivers/amba: add reset control to amba bus probe
  ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newer
  ARM: 8904/1: skip nomap memblocks while finding the lowmem/highmem boundary
  ARM: 8903/1: ensure that usable memory in bank 0 starts from a PMD-aligned address
  ARM: 8891/1: EDAC: armada_xp: Add support for more SoCs
  ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC
  ARM: 8892/1: EDAC: Add missing debugfs_create_x32 wrapper
  ARM: 8890/1: l2x0: add marvell,ecc-enable property for aurora
  ARM: 8889/1: dt-bindings: document marvell,ecc-enable binding
  ARM: 8886/1: l2x0: support parity-enable/disable on aurora
  ARM: 8885/1: aurora-l2: add defines for parity and ECC registers
  ARM: 8887/1: aurora-l2: add prefix to MAX_RANGE_SIZE
  ARM: 8902/1: l2c: move cache-aurora-l2.h to asm/hardware
  ARM: 8900/1: UNWINDER_FRAME_POINTER implementation for Clang
  ARM: 8898/1: mm: Don't treat faults reported from cache maintenance as writes
  ARM: 8896/1: VDSO: Don't leak kernel addresses
  ARM: 8895/1: visit mach-* and plat-* directories when cleaning
  ARM: 8894/1: boot: Replace open-coded nop with macro
  ARM: 8893/1: boot: Explain the 8 nops
  ARM: 8876/1: fix O= building with CONFIG_FPE_FASTFPE
  ...
2019-09-22 09:39:09 -07:00
Linus Torvalds 22331f8952 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu-feature updates from Ingo Molnar:

 - Rework the Intel model names symbols/macros, which were decades of
   ad-hoc extensions and added random noise. It's now a coherent, easy
   to follow nomenclature.

 - Add new Intel CPU model IDs:
    - "Tiger Lake" desktop and mobile models
    - "Elkhart Lake" model ID
    - and the "Lightning Mountain" variant of Airmont, plus support code

 - Add the new AVX512_VP2INTERSECT instruction to cpufeatures

 - Remove Intel MPX user-visible APIs and the self-tests, because the
   toolchain (gcc) is not supporting it going forward. This is the
   first, lowest-risk phase of MPX removal.

 - Remove X86_FEATURE_MFENCE_RDTSC

 - Various smaller cleanups and fixes

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  x86/cpu: Update init data for new Airmont CPU model
  x86/cpu: Add new Airmont variant to Intel family
  x86/cpu: Add Elkhart Lake to Intel family
  x86/cpu: Add Tiger Lake to Intel family
  x86: Correct misc typos
  x86/intel: Add common OPTDIFFs
  x86/intel: Aggregate microserver naming
  x86/intel: Aggregate big core graphics naming
  x86/intel: Aggregate big core mobile naming
  x86/intel: Aggregate big core client naming
  x86/cpufeature: Explain the macro duplication
  x86/ftrace: Remove mcount() declaration
  x86/PCI: Remove superfluous returns from void functions
  x86/msr-index: Move AMD MSRs where they belong
  x86/cpu: Use constant definitions for CPU models
  lib: Remove redundant ftrace flag removal
  x86/crash: Remove unnecessary comparison
  x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
  x86: Remove X86_FEATURE_MFENCE_RDTSC
  x86/mpx: Remove MPX APIs
  ...
2019-09-16 18:47:53 -07:00
Isaac Vaughn 3e443eb353 EDAC/amd64: Add PCI device IDs for family 17h, model 70h
Add the new Family 17h Model 70h PCI IDs (device 18h functions 0 and 6)
to the AMD64 EDAC module.

 [ bp: s/f17_base_addr_to_cs_size/f17_addr_mask_to_cs_size/g ]

Signed-off-by: Isaac Vaughn <isaac.vaughn@knights.ucf.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac@vger.kernel.org
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190906192131.8ced0ca112146f32d82b6cae@knights.ucf.edu
2019-09-07 07:29:27 +02:00
Robert Richter e701f41203 EDAC/mc_sysfs: Make debug messages consistent
Debug messages are inconsistently used in the error handlers. Some lack
an error message, some are called regardless of the return status,
messages for the same error are at different locations in the code
depending on the error code. This happens esp. near put_device() calls.

Make those debug messages more consistent. Additionally, unify the error
messages to have the same terms for the same operations of the device.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-5-rrichter@marvell.com
2019-09-04 11:39:19 +02:00
Robert Richter 644110e17d EDAC/mc_sysfs: Remove pointless gotos
Use direct returns instead of gotos. Error handling code becomes
smaller and better readable.

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-4-rrichter@marvell.com
2019-09-03 19:24:10 +02:00
Robert Richter d55c79ac86 EDAC: Prefer 'unsigned int' to bare use of 'unsigned'
Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for
edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl.

While at it, struct member dev_ch_attribute->channel is always used as
unsigned int. Change type to unsigned int to avoid type casts.

 [ bp: Massage. ]

Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
2019-09-03 19:21:19 +02:00
Chris Packham 23d103ae3e ARM: 8891/1: EDAC: armada_xp: Add support for more SoCs
The Armada 38x and other integrated SoCs use a reduced pin count so the
width of the SDRAM interface is smaller than the Armada XP SoCs. This
means that the definition of "full" and "half" width is reduced from
64/32 to 32/16.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 7f6998a412 ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC
Add support for the ECC functionality as found in the DDR RAM and L2
cache controllers on the MV78230/MV78x60 SoCs. This driver has been
tested on the MV78460 (on a custom board with a DDR3 ECC DIMM).

[cp use SPDX license]

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Jan Luebbe 0ecace04a3 ARM: 8892/1: EDAC: Add missing debugfs_create_x32 wrapper
We already have wrappers for x8 and x16, so add the missing x32 one.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-08-29 07:58:01 +01:00
Peter Zijlstra 5ebb34edbe x86/intel: Aggregate microserver naming
Currently big microservers have _XEON_D while small microservers have
_X, Make it uniformly: _D.

for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"`
do
	sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \
	       -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i}
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
2019-08-28 11:29:32 +02:00
Yazen Ghannam 81f5090db8 EDAC/amd64: Support asymmetric dual-rank DIMMs
Future AMD systems will support asymmetric dual-rank DIMMs. These are
DIMMs where the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

 [ bp: move csrow_sec_enabled() to the header, fix CS_ODD define and
   tone-down the capitalized words spelling. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-8-Yazen.Ghannam@amd.com
2019-08-23 16:09:52 +02:00
Yazen Ghannam 7574729e91 EDAC/amd64: Cache secondary Chip Select registers
AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain cases.

Cache these secondary Chip Select registers for future use.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-7-Yazen.Ghannam@amd.com
2019-08-23 12:55:05 +02:00
Yazen Ghannam 8a2eaab7da EDAC/amd64: Decode syndrome before translating address
AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675 ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
2019-08-23 07:50:21 +02:00
Yazen Ghannam e53a3b267f EDAC/amd64: Find Chip Select memory size using Address Mask
Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a couple of cases.

1) For single-rank and dual-rank non-interleaved, use the address mask
plus 1 as the size.

2) For dual-rank interleaved, do #1 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead and do the de-interleaving. In the non-interleaved case,
the original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0xFFFF9FE, De-interleaved=0x3FFFFFE

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.com
2019-08-23 07:23:53 +02:00