tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
chenyu	ab80ea0d38	use ubuntu for clang ci test (#1368 )	2023-07-28 20:51:25 -04:00
waifairer	d89fb729e5	flake8 (#1323 ) * flake8: Ignore frequent violations, correct infrequent ones * Ignore some rules in test * Reorder test ignores * Lint test + main * EOF indent * Include all E71,E72 errors * Test the failing case in CI * Revert "Test the failing case in CI" This reverts commit `110add0a70`. * Push to test! This reverts commit `f317532779`. * ok back to passing This reverts commit `ba5052685f`. * Prove that CI fails when formatting is incorrect. * Fix formatting * Remove duplicitous E117 rule * Use flake8 config for precommit --------- Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-24 11:19:58 -04:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
Jacob Pradels	b112edd2c3	Add pylint trailing whitespace rule (#1314 )	2023-07-21 13:37:55 -04:00
chenyu	a5f5330d91	Add Fuzz Test symbolic / shapetracker to CI. (#1278 ) * Fuzz test symbolic and shapetracker This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8. * mess again * no tail * test shapetracker too * Revert mess and enable all tests * removed leftover	2023-07-19 09:05:45 -07:00
chenyu	c96bf395df	Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265 ) * Enable JIT test * really test metal * Skip some device	2023-07-18 11:40:37 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Roelof van Dijk	d0e21a7398	ci: don't install recommended packages for GPU (#1215 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-07-11 15:38:49 -07:00
George Hotz	beb4d3ab01	Tensor Cores 2: Local Buffers Edition (#1057 ) * local buffers * work * works * invert_strides * work * non tc * fix shapetracker bug * stride priority * touchups * gate tensor cores * tensor core conv * cleanups * bug fixes * fix metal_matmul * fast tensor cores * more speed * buffer selection bug fix * fix CI maybe * ugh, CI is set to true, not 1 * tc allowed * add_gl_dimension * split out padding conv tests * does padding add fail * test_padded_conv2d_1x1 * skip metal ci stuff * more strict on yellow * float2 * strip parens * fix float2 * touch up * dtype * strip parens * no alias * bugfix * cast float2 and test tensor core ops * oops, don't hardcode 4	2023-07-09 09:06:00 -07:00
George Hotz	7151382364	Refactor load/store before tensor cores (#1193 ) * minor cleanups * render_const * now that's a nice refactor * clean up vload/vstore * clean up render_load * debugs there * dumb * err, this? * const float4 * what's failing * bugfix * statement includes semicolon * bugfix	2023-07-08 15:54:58 -07:00
George Hotz	d9c1d81e99	Revert "feat: cancel previous workflow runs on new commits (#1184 )" (#1194 ) This reverts commit `d66a0c285d`.	2023-07-08 11:26:13 -07:00
George Hotz	52600d532e	add 20 minute timeout	2023-07-07 23:02:28 -07:00
wozeparrot	d66a0c285d	feat: cancel previous workflow runs on new commits (#1184 )	2023-07-07 22:55:35 -07:00
foreign-sub	574cbda979	Quickstart (#1015 ) * fix quickstart md * add quickstart to ci	2023-06-29 13:26:58 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
George Hotz	70c07dfea5	5k line max (#1064 )	2023-06-27 10:53:18 -07:00
George Hotz	0f281e7b18	touchups	2023-06-25 15:24:26 -07:00
George Hotz	c8fbdeb48e	test speed llama (#1046 ) * test speed llama * oops, put it back * uses the real device codegen * just do it on the mac * pp * is faster? * Revert "is faster?" This reverts commit `42db542010`. * disable docker again for less load on CI	2023-06-25 15:22:56 -07:00
Jacky Lee	5d16cc283f	Docker fix (#1039 ) * Docker test * Remove extra installs * Don't run full test * No need for testing dependencies	2023-06-25 10:38:58 -07:00
cloud11665	264b1e5f48	cache gpuocelot build in cuda CI (#1032 )	2023-06-22 17:42:12 -07:00
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
George Hotz	0d4c4f4e9e	metal ci attempt (#1010 ) * metal ci attempt * skip failing ops tests * skip in the ops test * no dtype test	2023-06-19 09:23:55 -07:00
Diogo	6b1280f01c	fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956 ) * prelu and where casting * typing for safe_numpy * optional * get rid of tracing in ci * cleanup and resolved layernorm issues * removed debug print	2023-06-08 16:09:19 -07:00
kposborne2	00360da05b	Update broken `docs/abstractions.py` for changed ops, and add to CI (#930 ) * fix and add to ci * still have those * ocd * update other doc	2023-06-04 19:21:20 -07:00
George Hotz	a3feee29c5	make tests faster + add onnx (#815 ) * search one dir, disable slow * onnx tests * fast rnnt test	2023-05-27 08:53:32 -07:00
George Hotz	faf80418b7	pyopencl by default since GPU is default (#802 )	2023-05-25 17:48:18 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	dbc99c243b	why did that test break?	2023-04-18 17:08:38 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Cyril Roumégous	3f08613a2a	apply flake8 E203 rule (#684 )	2023-03-11 11:35:16 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	5dc227dba6	fix bug in ENABLE_METHOD_CACHE and enable for llvm	2023-03-06 07:43:40 -08:00
George Hotz	50012f679b	move get_contraction to shapetracker	2023-03-06 06:42:57 -08:00
George Hotz	7a1d96fd76	No negative (#632 ) * behavior is correct without VALIDHACKS * simple div and mod * fix tests * no negative variables * alt form is correct * still correct * bug in mulnode * at least validhacks works now * cleanups * test validhacks, and to_image_idx * cache compare key * tests and __neg__	2023-03-03 16:48:14 -08:00
George Hotz	999b44c274	fix external test + speed	2023-03-03 06:46:16 -08:00
George Hotz	459488bba2	fix linter (#630 ) * fix linter * no imports okay * explicit bases * disable in pylintrc	2023-03-02 20:06:20 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	3c8da6bd03	add typing	2023-02-28 10:54:46 -08:00
George Hotz	d584bae5c0	fine, openpilot can have 197 kernels	2023-02-27 11:48:36 -08:00
George Hotz	c9252d38b2	mypy cache breaks if you sometimes check untyped defs, no checking tests for now	2023-02-27 09:57:33 -08:00
George Hotz	e74779f19d	typing fixup	2023-02-27 09:52:04 -08:00
George Hotz	edc8fbfff2	woah, why isn't OPT=2	2023-02-27 08:03:31 -08:00
George Hotz	f4ee7d2cad	back to 196 kernels	2023-02-25 18:25:34 -08:00
George Hotz	6e98a172a0	fix broken contiguous	2023-02-25 17:41:49 -08:00
George Hotz	a44e8e4385	discard children on mop shuffle, 200 -> 196 kernels	2023-02-25 10:51:07 -08:00

1 2 3

125 Commits (37fa7e96fb9a823ba745cb39e19c3038cf8a4694)