tinygrab

Author	SHA1	Message	Date
Jacky Lee	ad4f6aa2cf	Add test for quick_gelu (#526 ) * Add test for quick_gelu * Bump PyTorch version for approximate	2023-02-03 20:01:39 -08:00
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
George Hotz	bd8a5c2ced	Simple CUDA Runtime (#480 ) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA	2023-01-27 16:26:24 -08:00
George Hotz	9245f4650a	indexer changes for master	2023-01-18 18:02:02 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	db2da22a04	stop blowing up floats	2022-10-30 16:47:16 -07:00
George Hotz	8afc643bb1	fix bug in ops test, it was cheating somehow	2022-10-30 16:43:24 -07:00
George Hotz	2f602a92ff	seperate STRIDED and EXPAND	2022-10-30 13:23:58 -07:00
George Hotz	52bfbc31be	vectorization	2022-10-29 12:47:52 -07:00
George Hotz	e473d35f90	llvm doesn't vectorize	2022-10-29 11:59:48 -07:00
George Hotz	b65b70812a	Exec AST (#404 ) * working exec ast * exec_ast is staticmethod * GenericExecAST * fold that sometimes * ExplicitExecAST * exec_ast for GPU * gpu working * get_lazyop_shape * now gpubuffer is ExplicitExecAST * dedup * add a type * RESHAPE in opencl code * fix linter * that too for linter * cleanups * remove dead code * GenericShape is less lines * add ALLOWED_KERNEL_COUNT to tests * fix mypy * that's gotta be recursive * fix opencl shape processing * remove unneeded lambda	2022-10-28 08:27:03 -07:00
George Hotz	10921a60c4	more imports from llvm branch	2022-10-26 18:02:36 -07:00
Drew Hintz	a4ad1d774a	enable tests in test_ops.py that are disabled but now work. (#396 ) remove custom tolerances that don't appear to be needed.	2022-10-13 09:58:53 -07:00
George Hotz	b7f748c15a	Fix GPU 2*31 virtual size limit (#392 ) in progress * big conv test works * that's unneeded * fix opencl with reduce * rewrite contiguous_view_constant_fold * clean up mids in loop code * subidx * print cl kernel before run * no reduce, no loop * Revert "no reduce, no loop" This reverts commit `92777e40e9`.	2022-10-05 00:55:20 -04:00
George Hotz	7a61dc7ee9	test_sd_big_conv	2022-10-01 13:26:05 -04:00
George Hotz	271446e3eb	set requires_grad to None (#387 ) * set requires_grad to None * some things need gradients * hmm, why was get_parameters filtering	2022-09-21 11:16:02 -04:00
George Hotz	29ae21bb0d	import tests from CL metal texture fix	2022-09-19 20:01:47 -04:00
George Hotz	57e804a9bf	add min support	2022-09-18 20:39:41 -04:00
George Hotz	3c3534736e	fix matmul kernel and tests	2022-09-13 08:31:04 -07:00
Comma Device	62e9419206	fix test failure on MATMUL=1 backward pass	2022-09-13 11:18:52 -04:00
Comma Device	3b82afc6a0	simple on device failing test	2022-09-13 10:59:15 -04:00
George Hotz	4efde1ba0a	test_matmul	2022-09-13 07:51:33 -07:00
George Hotz	790af99a48	fix slice one multi, and linear can be simpler with new broadcasting	2022-09-06 19:51:33 -07:00
YassineYousfi	5aad460c7a	broadcast from right to left (#375 ) * broadcast from right to left * add another broadcasted add test	2022-09-06 16:36:13 -07:00
George Hotz	bcb867cdd6	better idea for numbers, do the division in python	2022-09-03 16:23:39 -07:00
George Hotz	033a3ecccf	found tinygrad bug	2022-09-03 12:32:43 -07:00
George Hotz	5d45c6e516	Fold reduce (#362 ) * folding reduce * fold through movementops * fixup shapes * was too aggressive * i knew we needed that * don't recompute reduce * working * fix openpilot compile * prunegraph openpilot * types and reduce_shape * refactor * cleanups * neater * 1009 * 1004 * clean up reduce for 998	2022-07-19 09:24:02 -07:00
George Hotz	f93e297804	fix bug caused by rounding	2022-07-17 12:49:58 -07:00
George Hotz	bcf422dfdd	Device2 (#358 ) * option for matmul * fixups * fast like a nascar * running * thneed runner * no buffer id makes no backing buffer * move constant folding to the top * runs on mac * folded biases * was v slow * maybe just that * elu touchup * speed and float32 Co-authored-by: Comma Device <device@comma.ai>	2022-07-16 07:26:19 -07:00
George Hotz	5e46561f7e	no_grad = NOT backward	2022-07-10 20:54:57 -07:00
George Hotz	b34ae7876f	lol chr(10) not chr(13)	2022-07-10 20:03:11 -07:00
George Hotz	93c378dffc	add test for slice_one	2022-07-03 12:14:20 -07:00
George Hotz	dffde3de5a	support both asymmetric and negative padding	2022-06-26 17:59:25 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	8c483fbdc9	maxpool lazy fix	2022-06-26 17:07:03 -07:00
George Hotz	6b652dafb2	touchups	2022-06-19 16:57:14 -07:00
George Hotz	d5b3e18540	Accelerate with CL (#325 ) * accelerated opencl * it's running, it's just wrong * bugfix * model is correct in opencl * lazy image convert * add padding support to convolution * that stuff was all upstreamed * remove HEAD * oops * test_simple_conv2d_4 passes, add dilation support * put logic in ops_opencl * fix crash * hmm, stride seems okay * padding for batched inputs * just an issue now with cout%4 * op model still passes * fix startPackedInputChannel * pre and post processing ops for graph * don't break other llops * shapetrackering * reshapes are free * lazy movement ops	2022-06-16 15:40:52 -07:00
George Hotz	2a14befb74	support padding	2022-06-15 14:46:44 -07:00
George Hotz	fef6c82491	wow dilation support was simple	2022-06-15 11:38:23 -07:00
George Hotz	0b182029dd	support dilated convolution in torch	2022-06-14 18:03:35 -07:00
George Hotz	a690ba4588	add test for padding	2022-06-14 17:41:22 -07:00
George Hotz	e057ca23bb	add flip	2022-06-14 17:28:43 -07:00
George Hotz	dcbca4fdf1	Expand Operator (#327 ) * replace broadcasting with expand * Tensor, not self * remove broadcasting from mlops * delete useless A operator * expand, not repeat * remove A op * expand on gpu * binary_op doesn't broadcast anymore * expand is still total junk, but the tests should pass	2022-06-12 12:31:48 -07:00
George Hotz	33f18c61a1	test_broadcasted_add	2022-06-12 10:19:58 -07:00
George Hotz	85d17a2acd	running resnet onnx	2022-06-11 13:17:15 -07:00
George Hotz	db5a632e8c	multicat + test onnx is generic onnx	2022-06-11 11:50:47 -07:00

1 2 3

144 commits