1
0
Fork 0

nvidia notes

This commit is contained in:
George Hotz 2021-05-26 14:26:48 -07:00
parent 2653d33292
commit 1ae0e88627

19
docs/nvidia_notes Normal file
View file

@ -0,0 +1,19 @@
A100 has 432 tensor cores
8x4x8 FP16 matmul = 256 FMAs
8x4x4 TF32 matmul = 128 FMAs
432 * 256 * 1410 mhz * 2(m and a) = 312 TOPS
Why aren't the other accelerators 3D like this?
--
SNPE is using 4x4x4 -> 4x4 (64 FMAs) in the convs.
Then it's accumulating in that matrix.
256 ALUs (1 FMA per cycle)
652 mhz
---
319 GFLOPS