1
0
Fork 0
Commit Graph

13 Commits (661dcc5ed0fe979e5201733c06bd9e285f890fea)

Author SHA1 Message Date
Jeff Moe 661dcc5ed0 Reformat, uh, everything, with black 2023-12-04 22:01:04 -07:00
Rory Clear 553688f12a
update metal matmul and matvec for compile api (#2238) 2023-11-08 08:08:35 -08:00
George Hotz 67e34b356a
good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz 8b777af571 metal_conv gets over 10.4 TFLOPS... 2023-04-15 03:31:22 -07:00
George Hotz d66e682205 metal matmul from tcores branch 2023-04-14 23:29:29 -07:00
George Hotz 68e45fca18 metal_matmul: bw and torch sync 2023-03-23 08:02:04 -07:00
George Hotz bd6c3c31a9 compare to torch 2023-03-22 23:58:37 -07:00
George Hotz c3a3db75c7 fix metal matmul example 2023-03-22 23:42:51 -07:00
George Hotz 1a039306d2
good changes from llama branch (#671)
* good changes from llama

* transpose behavior changed
2023-03-09 20:51:22 -08:00
George Hotz bfcec234a2
Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
calledit 81f7c6800a
Added info on simdgroup availability (#586)
* Add info on simdgroup availability

* "osx" not "os x"

* Update metal_matmul.py

* Update metal_matmul.py
2023-02-23 13:59:02 -08:00
George Hotz bbfec2fde7 8.46 TFLOPS 2023-02-19 13:21:25 -08:00
George Hotz 1ba847963d reshape and retain metal_matmul 2023-02-19 13:07:23 -08:00