tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
Jeff Moe	661dcc5ed0	Reformat, uh, everything, with black	2023-12-04 22:01:04 -07:00
Rory Clear	553688f12a	update metal matmul and matvec for compile api (#2238 )	2023-11-08 08:08:35 -08:00
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
George Hotz	68e45fca18	metal_matmul: bw and torch sync	2023-03-23 08:02:04 -07:00
George Hotz	bd6c3c31a9	compare to torch	2023-03-22 23:58:37 -07:00
George Hotz	c3a3db75c7	fix metal matmul example	2023-03-22 23:42:51 -07:00
George Hotz	1a039306d2	good changes from llama branch (#671 ) * good changes from llama * transpose behavior changed	2023-03-09 20:51:22 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
calledit	81f7c6800a	Added info on simdgroup availability (#586 ) * Add info on simdgroup availability * "osx" not "os x" * Update metal_matmul.py * Update metal_matmul.py	2023-02-23 13:59:02 -08:00
George Hotz	bbfec2fde7	8.46 TFLOPS	2023-02-19 13:21:25 -08:00
George Hotz	1ba847963d	reshape and retain metal_matmul	2023-02-19 13:07:23 -08:00

13 Commits (661dcc5ed0fe979e5201733c06bd9e285f890fea)