1
0
Fork 0
Commit graph

1401 commits

Author SHA1 Message Date
George Hotz b67f997864 tests pass w/o float4 2023-01-30 15:40:49 -08:00
George Hotz c6f570a2e6 improve progress bar 2023-01-30 14:50:28 -08:00
Kevin Gilpin 4685c9c095
Big changes (#498)
Use make_pair
2023-01-30 14:42:22 -08:00
George Hotz 7118602c97 goat progress bar 2023-01-30 14:37:26 -08:00
George Hotz 7ee0d99c70 CLCACHE 2023-01-30 14:02:06 -08:00
George Hotz 7457f0d755 KOPT=2 2023-01-30 13:28:06 -08:00
George Hotz cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
AllentDan 7b6b1f32b1
[Fix] fix typo: test_mnist -> datasets (#492)
* test_mnist -> datasets

* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz 2db272c7f7
Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
Martin Loretz 43abbd3d00
Use force_create to allocate return buffer (#491) 2023-01-29 17:13:10 -08:00
George Hotz bb0cdc2442 111.51x speedup for reduce 2023-01-29 03:06:00 -08:00
George Hotz 45c0aa6e2d search with SHIFT, REDUCE 2023-01-29 02:42:20 -08:00
George Hotz 87879cf4b6 improve search more 2023-01-29 02:08:57 -08:00
George Hotz f6bbd43cb8 improve search 2023-01-29 01:33:47 -08:00
George Hotz ebdec2b72f fix optimizer 2023-01-29 00:23:06 -08:00
George Hotz a9cabce791 oops, broke mem estimates 2023-01-28 20:21:31 -08:00
George Hotz a500e79bd1 don't OPTWG on OS X, it's way slower 2023-01-28 20:02:33 -08:00
George Hotz b0df4d99a0 os x profiling: this ratio is exact i believe 2023-01-28 19:02:51 -08:00
George Hotz c0963b723e should fix tests 2023-01-28 15:13:03 -08:00
George Hotz b134a4f3d1 don't upcast already upcasted 2023-01-28 14:58:28 -08:00
George Hotz 2f194aadad loop unrolling upcast 2023-01-28 14:51:24 -08:00
George Hotz 381f3e92da fix prints, add third conv 2023-01-28 14:10:27 -08:00
George Hotz 92001a06e1 openpilot/go.sh 2023-01-28 13:57:43 -08:00
George Hotz aea29f8a6e fix CUDA reduce 2023-01-28 13:38:58 -08:00
George Hotz 0f34c24aeb move expr_idxs to shapetracker 2023-01-28 12:25:05 -08:00
George Hotz f2e81f7208 line reduction and cleanups 2023-01-28 12:17:40 -08:00
George Hotz 03dd1201dc local buffer implied 2023-01-28 12:06:28 -08:00
George Hotz b3e4e678e8
Use ShapeTracker for tracking shapes in kernels (#485)
* local is a normal buffer

* remove extra shapes and strides

* fix opt

* fix llvm
2023-01-28 11:56:32 -08:00
George Hotz 259c48f235 discord image is invite link 2023-01-28 11:42:11 -08:00
George Hotz d748000ada tinygrad discord 2023-01-28 11:36:15 -08:00
George Hotz ae810eb558 minor cleanups 2023-01-28 08:59:15 -08:00
George Hotz 713318745d padding size in get_conv_args 2023-01-28 08:47:18 -08:00
George Hotz 299d1cdc9c lil cleanup of load ldr 2023-01-28 00:31:57 -08:00
George Hotz 2b5bc5d4a1 factor out image_idx 2023-01-28 00:22:54 -08:00
George Hotz bd8a5c2ced
Simple CUDA Runtime (#480)
* factor out opencl runtime

* don't use CL outside the runtime

* cuda runtime adds

* final_dimension

* tests pass with CUDA backend

* more cuda

* cuda simpler

* retain old functionality

* linter and typing

* move globalcounters out of runtimes

* oops, GlobalCounters in cuda

* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
George Hotz 6d5e1a8029 GEMM kernel search 2023-01-27 10:08:57 -08:00
George Hotz 123993156d refactor group_for_reduce a little 2023-01-27 08:51:23 -08:00
George Hotz 82e58108e3 add flake8 to precommit 2023-01-26 22:31:45 -08:00
George Hotz f4b571039b fix shape types 2023-01-26 22:29:20 -08:00
Jacky Lee 026ba78526
Add commit hooks (#478)
* Add pre-commit hook

* We need ret

* Fix some type definitions
2023-01-26 22:24:31 -08:00
George Hotz c07bc39941 fix mypy, plz add commit hooks 2023-01-26 14:25:42 -08:00
Comma Device f08e740957 factor out hand coded opt 2023-01-26 14:54:06 -06:00
George Hotz 5e8a36a18b real op kernel 2023-01-26 09:51:32 -08:00
George Hotz e0600f537a op kernel in kernel search 2023-01-26 09:47:01 -08:00
George Hotz 60acb2641f ugh, don't use os 2023-01-25 19:41:21 -08:00
George Hotz b1dec64815 new types and fixup ShapeTracker type mismatches 2023-01-25 19:39:36 -08:00
George Hotz 1b624a5051 DeviceBuffer has abstract methods 2023-01-25 19:16:23 -08:00
George Hotz faab6461dd that lambda is required 2023-01-25 18:46:56 -08:00
George Hotz 44e96c58b4 touch up pytorch speed tests 2023-01-25 18:11:26 -08:00