1
0
Fork 0
alistair23-linux/drivers
Samuel Holland c950ca8c35 clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).

Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.

Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:

 Count | Event
-------+---------------------------
  9940 | jumped backward      699ms
   268 | jumped backward     1398ms
     1 | jumped backward     2097ms
 16020 | jumped forward       175ms
  6443 | jumped forward       699ms
  2976 | jumped forward      1398ms
     9 | jumped forward    356516ms
     9 | jumped forward    357215ms
     4 | jumped forward    714430ms
     1 | jumped forward   3578440ms

This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.

The largest jump (almost an hour!) was the following sequence of reads:
    0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000

Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.

Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.

Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:

 Count | Event
-------+---------------------------
    17 | jumped backward      699ms
    52 | jumped forward       175ms
  2831 | jumped forward       699ms
     5 | jumped forward      1398ms

Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000
 0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000
 0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000
 0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000

The pattern of errors in CNTV_TVAL seemed to depend on exactly which
value was written to it. For example, after writing 0x10101010:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000
 0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000
 0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000
 0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000
 0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000
 0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000

I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL
--------------------+------------+--------------------------------------
 0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past)
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

========================================================================

Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.

However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.

For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.

Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".

[1]: https://github.com/armbian/build/commit/a08cd6fe7ae9
[2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
[3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26
[4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272

Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-02-23 12:13:45 +01:00
..
accessibility
acpi arm64 fixes for -rc2 2019-01-11 12:25:40 -08:00
amba
android binder: implement binderfs 2018-12-19 09:40:13 +01:00
ata for-linus-20190112 2019-01-12 13:40:51 -08:00
atm cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
auxdisplay auxdisplay: charlcd: fix x/y command parsing 2018-12-21 21:27:21 +01:00
base Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep' 2019-01-11 10:09:51 +01:00
bcma
block for-linus-20190112 2019-01-12 13:40:51 -08:00
bluetooth Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading 2018-12-19 13:43:42 +01:00
bus ARM: SoC driver updates 2018-12-31 17:32:35 -08:00
cdrom gdrom: fix a memory leak bug 2018-12-29 08:20:44 -07:00
char Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
clk Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-01-02 18:56:59 -08:00
clocksource clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability 2019-02-23 12:13:45 +01:00
connector
cpufreq Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep' 2019-01-11 10:09:51 +01:00
cpuidle powerpc updates for 4.21 2018-12-27 10:43:24 -08:00
crypto cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
dax mm, devm_memremap_pages: fix shutdown handling 2018-12-28 12:11:47 -08:00
dca
devfreq
dio
dma cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
dma-buf drivers/dma-buf/udmabuf.c: convert to use vm_fault_t 2019-01-04 13:13:46 -08:00
edac EDAC, fsl_ddr: Add LS1021A to the list of supported hardware 2018-12-19 11:57:45 +01:00
eisa
extcon
firewire FireWire (IEEE 1394) subsystem patch: 2019-01-05 18:33:21 -08:00
firmware arm64 fixes for -rc1 2019-01-05 11:28:39 -08:00
fmc
fpga Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
fsi
gnss
gpio This is the bulk of GPIO changes for the v4.21 kernel series: 2018-12-28 20:00:21 -08:00
gpu remove dma_zalloc_coherent 2019-01-12 10:52:40 -08:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2019-01-05 17:53:40 -08:00
hsi
hv Char/Misc driver patches for 4.21-rc1 2018-12-28 20:54:57 -08:00
hwmon Kconfig updates for v4.21 2018-12-29 13:03:29 -08:00
hwspinlock hwspinlock: fix return value check in stm32_hwspinlock_probe() 2019-01-03 11:42:10 -08:00
hwtracing intel_th: msu: Fix an off-by-one in attribute store 2018-12-19 20:21:06 +01:00
i2c i2c: tegra: Fix Maximum transfer size 2019-01-11 00:15:04 +01:00
i3c
ide for-4.21/block-20181221 2018-12-28 13:19:59 -08:00
idle
iio Staging/IIO driver patches for 4.21-rc1 2018-12-28 20:39:58 -08:00
infiniband cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
input cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
iommu cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
ipack
irqchip irqchip/csky: fixup handle_irq_perbit break irq 2019-01-09 00:18:46 +08:00
isdn isdn: fix kernel-infoleak in capi_unlocked_ioctl 2019-01-02 10:31:39 -08:00
leds LEDs for 4.21-rc1 2018-12-25 14:52:50 -08:00
lightnvm lightnvm: pblk: fix use-after-free bug 2018-12-22 14:45:35 -07:00
macintosh Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
mailbox mailbox: tegra-hsp: Use device-managed registration API 2018-12-21 22:31:26 -06:00
mcb
md Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md into for-linus 2019-01-03 08:21:02 -07:00
media cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
memory ARM: SoC: late updates 2019-01-05 11:30:37 -08:00
memstick MMC core: 2018-12-28 16:52:18 -08:00
message scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
mfd
misc cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
mmc cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
mtd mtd: rawnand: qcom: fix memory corruption that causes panic 2019-01-08 12:33:24 +01:00
mux
net cross-tree: phase out dma_zalloc_coherent() on headers 2019-01-08 07:58:49 -05:00
nfc
ntb cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
nubus
nvdimm Merge branch 'akpm' (patches from Andrew) 2018-12-28 16:55:46 -08:00
nvme for-linus-20190112 2019-01-12 13:40:51 -08:00
nvmem
of Merge tag 'devicetree-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux 2018-12-28 20:08:34 -08:00
opp cpufreq: scpi/scmi: Fix freeing of dynamic OPPs 2019-01-04 12:19:40 +01:00
oprofile
parisc Kconfig file consolidation for v4.21 2018-12-29 13:40:29 -08:00
parport
pci remove dma_zalloc_coherent 2019-01-12 10:52:40 -08:00
pcmcia Included in this update: 2019-01-05 11:23:17 -08:00
perf drivers/perf: hisi: Fixup one DDRC PMU register offset 2019-01-04 10:13:27 +00:00
phy phy: fix build breakage: add PHY_MODE_SATA 2019-01-12 21:07:14 -08:00
pinctrl Pin control bulk changes for the v4.21 kernel cycle: 2019-01-01 13:19:16 -08:00
platform chrome platform changes for v4.21 2019-01-06 11:40:06 -08:00
pnp Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
power power supply and reset changes for the v4.21 series 2018-12-28 20:22:45 -08:00
powercap
pps Kconfig updates for v4.21 2018-12-29 13:03:29 -08:00
ps3
ptp Char/Misc driver patches for 4.21-rc1 2018-12-28 20:54:57 -08:00
pwm pwm: imx: Add ipg clock operation 2018-12-24 12:06:56 +01:00
rapidio cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
ras treewide: surround Kconfig file paths with double quotes 2018-12-22 00:25:54 +09:00
regulator Merge remote-tracking branch 'regulator/topic/coupled' into regulator-next 2018-12-21 13:43:35 +00:00
remoteproc
reset reset: uniphier-glue: Add AHCI reset control support in glue layer 2019-01-07 16:38:51 +01:00
rpmsg
rtc RTC for 4.21 2019-01-01 13:24:31 -08:00
s390 cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
sbus Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next 2018-12-26 10:32:18 -08:00
scsi cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
sfi
sh
siox
slimbus
sn
soc ARM: SoC fixes 2019-01-14 10:34:14 +12:00
soundwire
spi cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
spmi
ssb
staging Staging driver fixes for 5.0-rc2 2019-01-14 05:49:35 +12:00
target SCSI misc on 20181224 2018-12-28 14:48:06 -08:00
tc
tee OP-TEE dynamic shm log message 2018-12-31 13:06:30 -08:00
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2019-01-05 16:07:28 -08:00
thunderbolt
tty tty/serial fixes for 5.0-rc2 2019-01-14 05:47:48 +12:00
uio Char/Misc driver patches for 4.21-rc1 2018-12-28 20:54:57 -08:00
usb USB fixes for 5.0-rc2 2019-01-14 05:45:28 +12:00
uwb
vfio vfio/type1: Fix unmap overflow off-by-one 2019-01-08 09:31:28 -07:00
vhost Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
video cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
virt
virtio virtio, vhost: features, fixes, cleanups 2019-01-02 18:54:45 -08:00
visorbus
vlynq
vme
w1 treewide: surround Kconfig file paths with double quotes 2018-12-22 00:25:54 +09:00
watchdog linux-watchdog 4.21-rc1 tag 2019-01-01 13:16:45 -08:00
xen Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
zorro
Kconfig Kconfig file consolidation for v4.21 2018-12-29 13:40:29 -08:00
Makefile