nvidia notes

2021-05-26 14:26:48 -07:00 · 2021-05-26 14:26:48 -07:00 · 1ae0e88627
parent 2653d33292
commit 1ae0e88627
1 changed files with 19 additions and 0 deletions
--- a/docs/nvidia_notes
+++ b/docs/nvidia_notes
@ -0,0 +1,19 @@
+A100 has 432 tensor cores
+
+8x4x8 FP16 matmul = 256 FMAs
+8x4x4 TF32 matmul = 128 FMAs
+
+432 * 256 * 1410 mhz * 2(m and a) = 312 TOPS
+
+Why aren't the other accelerators 3D like this?
+
+-- 
+
+SNPE is using 4x4x4 -> 4x4 (64 FMAs) in the convs.
+Then it's accumulating in that matrix.
+
+256 ALUs (1 FMA per cycle)
+652 mhz
+---
+319 GFLOPS
+