Jeff Moe
|
661dcc5ed0
|
Reformat, uh, everything, with black
|
2023-12-04 22:01:04 -07:00 |
Rory Clear
|
553688f12a
|
update metal matmul and matvec for compile api (#2238)
|
2023-11-08 08:08:35 -08:00 |
George Hotz
|
67e34b356a
|
good stuff from tensor cores branch (#1199)
|
2023-07-08 16:58:26 -07:00 |
George Hotz
|
8b777af571
|
metal_conv gets over 10.4 TFLOPS...
|
2023-04-15 03:31:22 -07:00 |
George Hotz
|
d66e682205
|
metal matmul from tcores branch
|
2023-04-14 23:29:29 -07:00 |
George Hotz
|
68e45fca18
|
metal_matmul: bw and torch sync
|
2023-03-23 08:02:04 -07:00 |
George Hotz
|
bd6c3c31a9
|
compare to torch
|
2023-03-22 23:58:37 -07:00 |
George Hotz
|
c3a3db75c7
|
fix metal matmul example
|
2023-03-22 23:42:51 -07:00 |
George Hotz
|
1a039306d2
|
good changes from llama branch (#671)
* good changes from llama
* transpose behavior changed
|
2023-03-09 20:51:22 -08:00 |
George Hotz
|
bfcec234a2
|
Refactor ASTs (#622)
* ugh worst branch name
* compiler refactor continues
* scc -> cloc
* buf -> _buf
* finish _buf, and program -> runtime
* gpu is still working, clang isn't
* clang in new style
* ops_metal
* something broke it
* improve metal
* clean up tons of cl crap
* hack fix sync
* cleaner gpu
* gpu metal clang
* cleanups
* minor refactor
* GPUCodegen
* fix up LLVM
* blind CUDA refactor
* codegen / runtime
* keep ops naming
* linter passes
* woah, llvm was allocing 4x what it needed to
* bugfixes
* fix openpilot compiler
* fix compile_efficientnet
* method cache should fix tests
* deal with duped functions
|
2023-03-01 18:57:29 -08:00 |
calledit
|
81f7c6800a
|
Added info on simdgroup availability (#586)
* Add info on simdgroup availability
* "osx" not "os x"
* Update metal_matmul.py
* Update metal_matmul.py
|
2023-02-23 13:59:02 -08:00 |
George Hotz
|
bbfec2fde7
|
8.46 TFLOPS
|
2023-02-19 13:21:25 -08:00 |
George Hotz
|
1ba847963d
|
reshape and retain metal_matmul
|
2023-02-19 13:07:23 -08:00 |