tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
Friedrich Carl Eichenroth	740304ef9d	Small Onnx Parser Improvements (#885 ) * wip * rename onnx_version to onnx_model_versioN * add type * add types * small cleanup * revert some changes from before * add todo * dumb fix	2023-06-01 00:01:01 -07:00
Marcello Fuschi	3924aae8ed	Fix ONNX dropout and unify the implementation (#857 ) * Fix ONNX dropout and unify the implementation * Use tensor rand method for dropout * Change approach for RNG in ONNX Dropout * Fix style * Test legacy RNG seeding * Remove the necessity for legacy RNG in Tensor class	2023-05-31 07:40:47 -07:00
skobsman	2e393f7ef2	InstanceNormalization ONNX test fixed. (#870 )	2023-05-30 16:07:44 -07:00
Friedrich Carl Eichenroth	f91f28d9e2	fix a bunch of tests (#856 )	2023-05-29 17:48:26 -07:00
zk-tarts	174c65b7d9	add onnx Binarizer op (#850 ) Co-authored-by: zk-tarts <>	2023-05-29 13:15:50 -07:00
M4tthewDE	4408c25e9a	Add Onnx op Shrink (#851 ) * Add onnx Shrink operation * Fix soft/hard shrink onnx test	2023-05-29 13:15:39 -07:00
Friedrich Carl Eichenroth	6f2b3755ca	set axis default to 0 (#854 )	2023-05-29 13:15:28 -07:00
Friedrich Carl Eichenroth	3b158f7a5f	fix onnx versions greater or equal 10 (#853 )	2023-05-29 13:04:06 -07:00
Diogo	1a5d72f812	Onnx ops And, Or, Xor, Not (#847 ) * onnx and, or, xor, not * added bool type to llvm and clang * removed float conversion * switched where op to use tensor func	2023-05-29 11:09:20 -07:00
SnakeOnex	844e6d0753	conv1d & conv3d onnx tests (#835 ) * conv1d onnx * [Work in progress] conv1d + enforcing full padding tuple length * make ONNX padding reorder not hardcoded, works for 1D and 3D convs now * conv2d interprets padding based on the input tensor dimensions	2023-05-29 10:16:45 -07:00
Marcello Fuschi	6d49925a26	Add max_pool2d dilation (#833 )	2023-05-28 15:16:48 -07:00
cheeetoo	21d27d31a9	Fix a couple pad tests (#827 ) * fix pad bug * float type hint for value * convert pads to list * update Pad type signature * Change \| to Union since not supported in < python 3.10	2023-05-28 12:06:46 -07:00
Mattis Megevand	606b841d3f	LR Schedulers (#755 ) * lr schedulers + test * lr scheduler test moved + integration test * integration test for all lr scheduler * lr scheduler test now deterministic * changed optimizer + parameters for lr sched test	2023-05-27 07:47:49 -07:00
George Hotz	87fa5af70a	ptx example	2023-05-26 19:28:51 -07:00
George Hotz	26014a0fa1	add convtranspose (#809 ) * add convtranspose * onnx convtranspose	2023-05-26 12:35:03 -07:00
wozeparrot	7351eb4b61	feat: put temperary file in the same directory as the destination file (#805 )	2023-05-25 20:46:02 -07:00
Diogo	c19ef0fcce	Add sin/cos/tan (#794 ) * added sin/cos/tan * fix lint * added onnx ops support	2023-05-25 09:04:56 -07:00
George Hotz	0400315078	Revert "ops rdna" This reverts commit `81a11d891d`.	2023-05-21 13:02:18 -07:00
George Hotz	325a3bf2cf	Revert "writing 2" This reverts commit `dddd6c42f0`.	2023-05-21 13:02:17 -07:00
George Hotz	dddd6c42f0	writing 2	2023-05-21 12:52:36 -07:00
George Hotz	81a11d891d	ops rdna	2023-05-21 11:45:38 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00
George Hotz	89b8b39d9c	fix mypy	2023-05-13 21:25:36 -07:00
George Hotz	e0b2035023	fast imagenet eval, gets 76.14% across the set	2023-05-13 21:18:31 -07:00
George Hotz	46d419060b	start on mlperf models	2023-05-10 16:30:49 -07:00
George Hotz	cb7c22beeb	fix mypy	2023-05-06 19:18:54 +00:00
George Hotz	5190037cbc	rocm: disassembler for shader	2023-05-06 19:07:52 +00:00
George Hotz	42256c0d9d	rocm sniffer dumps code	2023-05-05 18:36:53 +00:00
George Hotz	f2a964f447	nocopy (#764 )	2023-05-05 09:32:06 -07:00
George Hotz	3a2011ab2d	rocm sniffer	2023-05-04 22:22:39 +00:00
George Hotz	a55c4f5000	better rocm build scripts	2023-05-04 09:14:05 +00:00
George Hotz	987b1aaf96	rocm build scripts	2023-05-04 08:45:23 +00:00
George Hotz	ed33a89d52	no werror in archprobe	2023-05-03 19:34:17 +00:00
George Hotz	7ecf4dff68	multi cl_queue (#762 ) * multi cl_queue * only platforms 1 * gpus first, then cpus * put device on underlying buffer * cl_queue array	2023-05-03 12:15:28 -07:00
George Hotz	3b933b0a2f	rocm setup script	2023-05-03 16:01:17 +00:00
George Hotz	59d0d168cd	FLOAT16 off works	2023-04-19 15:34:56 -07:00
George Hotz	3d15769a8f	50 TFLOPS cuda matmul	2023-04-19 14:38:24 -07:00
George Hotz	0b5a0b9ba4	winograd comment	2023-04-16 03:36:51 -07:00
George Hotz	8b777af571	metal_conv gets over 10.4 TFLOPS...	2023-04-15 03:31:22 -07:00
George Hotz	d66e682205	metal matmul from tcores branch	2023-04-14 23:29:29 -07:00
Sohaib	70b9072663	add Pad onnx operator and rework _padding (#740 )	2023-04-06 17:07:36 +05:30
George Hotz	94e2c49c35	test_cacheline_size that works in both places	2023-03-30 06:47:20 +04:00
George Hotz	b05c2828f7	better cacheline test	2023-03-30 06:08:54 +04:00
George Hotz	76db1af6fc	better archprobe	2023-03-30 05:52:00 +04:00
George Hotz	20894991ed	good changes from the M1 Tensor Core project (#730 ) * good changes * working except llvm * llvm types * nice acc * archprobe * lang.float4 * use self.acc for late acc * fix store bug	2023-03-29 05:11:02 +04:00
George Hotz	68e45fca18	metal_matmul: bw and torch sync	2023-03-23 08:02:04 -07:00
George Hotz	bd6c3c31a9	compare to torch	2023-03-22 23:58:37 -07:00
George Hotz	c3a3db75c7	fix metal matmul example	2023-03-22 23:42:51 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Kirill	af7745073f	Add comments to SD (#686 ) * Add explanation for empty lambdas * Fix my_unpickle if pytorch_lightning is installed * oops	2023-03-12 10:56:49 -07:00
George Hotz	6c3675c01c	_mmap loads to gpu fast	2023-03-11 23:00:13 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
Diogo	784afc6c6f	Eq magic function support (#683 ) * add eq magic func * changed from eq to __eq__ * ignore type for linter * mypy doenst like descriptions :(	2023-03-11 10:31:46 -08:00
George Hotz	01f39b19dc	move to shapetracker.py	2023-03-11 07:50:07 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	d7cb8e3e56	multithreaded fake_torch_load_zipped	2023-03-10 19:16:27 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	d26345595d	more llama stuff	2023-03-10 10:48:10 -08:00
George Hotz	1a039306d2	good changes from llama branch (#671 ) * good changes from llama * transpose behavior changed	2023-03-09 20:51:22 -08:00
George Hotz	d8dda2af3a	openpilot fixups	2023-03-06 14:14:44 -08:00
George Hotz	a77d792aff	Codegen gpu cleanups (#640 ) * cleanups * fixups * handle pre upcasted global buffers * early is just required * delete junk from hand coded opt * implicit upcast_in_mid_reduce * speedup * fix exec w validhacks * reorder opt * only need to check the output for that * return total runtime from kernels if debugging	2023-03-04 15:31:51 -08:00
Patrick Geneva	117111825c	Fix windows file permission error (#634 )	2023-03-04 09:23:55 -08:00
George Hotz	528cb3b3b9	fix ast test	2023-03-04 07:49:25 -08:00
George Hotz	893f136fe0	lines from helpers	2023-03-03 23:07:46 -08:00
George Hotz	c53efb3635	optimize for CL (#633 ) * required opt * simplify * works * shift_to_last * required is fine * print shape in colored * better shape * args was wrong * debugs * fix empty shape * colored shape printer	2023-03-03 22:00:09 -08:00
Diogo	52204a7b88	adding comparison operators (#616 ) * Less, LessOrEqual, Greater, GreaterOrEqual, Equal * lint fix * using built in functions * overriding __eq__ breaks things * backwards pass for less - foward only tests * one other spot * removing backwards for comparison ops to match pytorch * raise runtime error * more tests for comparison ops * fixed the lineup * added number upcast tests	2023-03-02 08:10:44 -08:00
George Hotz	d062cc82b8	put restrict back	2023-03-01 21:34:45 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	7e6edfbc64	unbreak onnx conv padding	2023-02-28 13:55:03 -08:00
George Hotz	7d556ca7e0	avg/max pool work in N-D	2023-02-28 13:38:27 -08:00
George Hotz	d584bae5c0	fine, openpilot can have 197 kernels	2023-02-27 11:48:36 -08:00
George Hotz	7b999add1d	all onnx model tests pass	2023-02-27 11:22:45 -08:00
George Hotz	652d48ccec	onnx : openpilot expand issue was fixed yesterday. remove hack	2023-02-27 11:04:42 -08:00
George Hotz	9d6b63f043	add ConstantOfShape	2023-02-27 10:57:50 -08:00
George Hotz	082134952b	CastLike works with one type hack	2023-02-27 10:51:26 -08:00
Jacky Lee	1ffe8d68d5	Add more onnx ops (#615 ) * Add Celu * Add thresholded relu * Add softsign	2023-02-27 10:43:41 -08:00
George Hotz	643e8b0388	fix tests, test bn evaluate too	2023-02-27 10:39:47 -08:00
Diogo	07e643431c	added onnx group norm (#614 )	2023-02-27 08:11:01 -08:00
Diogo	e68fa18c9b	layer norm support in onnx (#607 ) * layer norm support * switched to 1e-05	2023-02-26 22:04:02 -08:00
George Hotz	3a2a500e90	prevent race condition, external yolo test for now	2023-02-26 17:08:24 -08:00
Sohaib	71ae6e5605	fix: avgpool without counting padding (#605 )	2023-02-26 07:13:00 -08:00
George Hotz	a8de233e12	only div, no reciprocal (#601 ) * only div, no reciprocal * remove reciprocal * fix pad shuffling	2023-02-25 09:35:03 -08:00
Sohaib	d581a99d90	onnx: lrn (#602 ) Co-authored-by: Sohaib Errabii <errabii.sohaib@gmail.com>	2023-02-25 09:24:53 -08:00
voidz	94bec40110	moved extras/jit.py -> tinygrad/jit.py (#599 ) * moved extras/jit.py to tinygrad/jit.py * fixed indent * removed tinygrad.helpers.DEBUG from jit.py	2023-02-25 08:32:33 -08:00
George Hotz	2c5e13a513	Reluless (#600 ) * replace relu for maximum * fix for other backend * clean up RELU and GT0 * tests for maximum * had to clean that up * why reverse a maximum?	2023-02-25 01:21:16 -08:00
George Hotz	176ad29974	retain support for old onnx	2023-02-24 22:29:54 -08:00
George Hotz	da5643d024	rest of tests shouid be made to pass	2023-02-24 12:52:23 -08:00
George Hotz	85452fbaf3	onnx 58/109/208	2023-02-24 12:19:05 -08:00
George Hotz	e8a153e4e9	onnx : add a whole bunch of ops	2023-02-24 12:00:03 -08:00
George Hotz	f2486a7248	more onnx ops	2023-02-24 10:55:58 -08:00
George Hotz	4d0a3dd653	openpilot expand is bugged	2023-02-24 10:25:59 -08:00
George Hotz	2e56a4793e	rename log_softmax, support dim, fix onnx Softmax	2023-02-24 10:11:24 -08:00
George Hotz	5cdfeffe2c	fix shape test	2023-02-24 09:36:32 -08:00
George Hotz	3becefa218	fix onnx tests	2023-02-24 09:27:18 -08:00
George Hotz	e263c0c628	onnx : another model test is passing	2023-02-24 09:22:58 -08:00
George Hotz	d3feea302d	much cleaner way to write onnx ops	2023-02-24 08:46:28 -08:00
George Hotz	f6d946853c	more bugfixes	2023-02-24 00:21:29 -08:00
George Hotz	b1b2d8f440	onnx : some op tests working	2023-02-23 23:58:13 -08:00
George Hotz	b287b1d529	fix yolov8 to get to ConvTranspose	2023-02-23 22:46:48 -08:00
George Hotz	2d59b25ead	onnx backend test : enable only the model tests	2023-02-23 22:36:26 -08:00
George Hotz	d8b6f241f1	external_test_onnx_backend	2023-02-23 21:55:07 -08:00
Sohaib	8835df7a5c	upgrade onnx to 1.13.0 (#588 ) - remove protobuf from direct dependencies - replace deprecated mapping.TENSOR_TYPE_TO_NP_TYPE Co-authored-by: Sohaib Errabii <sohaib.errabii@ipops.io>	2023-02-23 13:59:23 -08:00
calledit	81f7c6800a	Added info on simdgroup availability (#586 ) * Add info on simdgroup availability * "osx" not "os x" * Update metal_matmul.py * Update metal_matmul.py	2023-02-23 13:59:02 -08:00
George Hotz	d22e19536b	onnx: support low quality Resize. stuck on ConvTranspose will have to wait for convless	2023-02-23 09:05:23 -08:00
George Hotz	ab3a2ae9a2	fix test_resnet in onnx now that maxpool works	2023-02-23 08:41:47 -08:00
George Hotz	fd6082dcef	support all _pool2d. conv will eventually be an hlop	2023-02-23 08:19:47 -08:00
George Hotz	76b4d0577d	yolov8 works up to the MaxPool	2023-02-22 19:32:13 -08:00
George Hotz	c4c2c28738	a sustainable approach to float4 (#582 ) * a sustainable approach to float4 * can_float4 * fix tests * fix float4 * delete dead code * types and minor cleanup	2023-02-22 09:45:08 -08:00
George Hotz	c5e2126d49	move DEBUG to helpers	2023-02-22 06:52:11 -08:00
George Hotz	4d232c7c95	optional networkx + DEBUGCL=2	2023-02-20 09:50:46 -08:00
George Hotz	bbfec2fde7	8.46 TFLOPS	2023-02-19 13:21:25 -08:00
George Hotz	1ba847963d	reshape and retain metal_matmul	2023-02-19 13:07:23 -08:00
Kirill	7944cfdadc	Remove Tensor.data (#565 )	2023-02-18 16:36:12 -08:00
Jacky Lee	9fd41632c6	Import get_parameters from tinygrad.nn (#559 ) * get_parameter is in optim * Update all imports for get_parameters * Clean up * use optim.get_paramters	2023-02-17 15:22:26 -08:00
George Hotz	82c257e8f5	more kernel search	2023-02-12 10:34:56 -08:00
George Hotz	de71c13934	test speed v torch uses jit	2023-02-12 07:43:17 -08:00
George Hotz	ba3bf5bdf7	cifar stops learning	2023-02-11 17:21:42 -08:00
George Hotz	40f3949742	fancier KOPT	2023-02-11 16:40:25 -08:00
George Hotz	446442dbb3	fix tests symbolic	2023-02-11 15:16:47 -08:00
George Hotz	20a351a3c6	hand optim CONVW	2023-02-11 14:41:08 -08:00
George Hotz	031edd01e6	switch openpilot compile to TinyJit	2023-02-11 09:51:44 -08:00
George Hotz	608fd730d3	put the JIT in extra	2023-02-11 00:35:18 -06:00
George Hotz	fed95119dc	CL.mem_used -> GlobalCounters.mem_used	2023-02-10 23:13:29 -06:00
Kirill	27154db99a	Downloads weights in examples/stable_diffusion.py (#537 ) * Downloads weights in examples/stable_diffusion.py * use download_file_if_not_exists in fetch * make consistent with previous NOCACHE behavior	2023-02-10 14:37:04 -06:00
George Hotz	5ed3622965	add dump to kernel_search	2023-02-10 12:13:30 -06:00
George Hotz	d9555bc478	that turned out to be dumb	2023-02-08 16:52:29 -06:00
George Hotz	3d63934995	refactor to keep cl in the runtime (#545 ) * refactor to keep cl in the runtime * fix thneed, rename cl to _cl * bugfix + _cuda * fix tests * thneed more correct	2023-02-08 16:46:09 -06:00
George Hotz	2844482a60	Mypy fun (#541 ) * mypy fun * things are just faster * running fast * mypy is fast * compile.sh * no gpu hack * refactor ops_cpu and ops_torch to not subclass * make weak buffer work * tensor works * fix test failing * cpu/torch cleanups * no or operator on dict in python 3.8 * that was junk * fix warnings * comment and touchup	2023-02-08 09:56:51 -06:00
George Hotz	185d2e3678	fix map_buffer and add some __slots__	2023-02-07 15:32:48 -06:00
George Hotz	d93563f39f	fix KOPT	2023-02-07 06:56:33 -06:00
George Hotz	f7291f6ca3	fixes big KOPT, breaks opencl (#505 ) * fixes big KOPT, breaks opencl * fix optimizer * KernelCache * oops, broke batchnorm * hack to fix it * fix llvm, less hacky gpu * disable the cache * cache just breaks things	2023-02-05 10:46:17 -08:00
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
George Hotz	60ccddb58b	reenable SWAP	2023-01-30 17:32:02 -08:00
George Hotz	aea55eb196	found failing upcast	2023-01-30 16:12:56 -08:00
George Hotz	b67f997864	tests pass w/o float4	2023-01-30 15:40:49 -08:00
George Hotz	c6f570a2e6	improve progress bar	2023-01-30 14:50:28 -08:00
George Hotz	7118602c97	goat progress bar	2023-01-30 14:37:26 -08:00
George Hotz	cccfea4b25	factor out KOPT code	2023-01-30 13:13:55 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
AllentDan	7b6b1f32b1	[Fix] fix typo: test_mnist -> datasets (#492 ) * test_mnist -> datasets * fix mnist_gan	2023-01-29 21:30:47 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
George Hotz	bb0cdc2442	111.51x speedup for reduce	2023-01-29 03:06:00 -08:00
George Hotz	45c0aa6e2d	search with SHIFT, REDUCE	2023-01-29 02:42:20 -08:00
George Hotz	87879cf4b6	improve search more	2023-01-29 02:08:57 -08:00
George Hotz	f6bbd43cb8	improve search	2023-01-29 01:33:47 -08:00
George Hotz	ebdec2b72f	fix optimizer	2023-01-29 00:23:06 -08:00
George Hotz	a9cabce791	oops, broke mem estimates	2023-01-28 20:21:31 -08:00
George Hotz	a500e79bd1	don't OPTWG on OS X, it's way slower	2023-01-28 20:02:33 -08:00
George Hotz	b0df4d99a0	os x profiling: this ratio is exact i believe	2023-01-28 19:02:51 -08:00
George Hotz	ae810eb558	minor cleanups	2023-01-28 08:59:15 -08:00
George Hotz	6d5e1a8029	GEMM kernel search	2023-01-27 10:08:57 -08:00
Comma Device	f08e740957	factor out hand coded opt	2023-01-26 14:54:06 -06:00
George Hotz	5e8a36a18b	real op kernel	2023-01-26 09:51:32 -08:00
George Hotz	e0600f537a	op kernel in kernel search	2023-01-26 09:47:01 -08:00
George Hotz	aafc29484a	cleanups	2023-01-25 12:37:10 -08:00
George Hotz	919e943867	decent search	2023-01-25 12:20:53 -08:00
George Hotz	7f3da91f8b	kernel_search	2023-01-25 12:05:09 -08:00
George Hotz	e37424424f	first little attempt at search	2023-01-25 11:49:29 -08:00
Comma Device	9e2af0a972	too far with the OPTWG	2023-01-24 13:14:59 -06:00
Comma Device	3590848b93	a little more local workgroup options	2023-01-24 12:50:27 -06:00
Comma Device	4b74752c42	fix hotspots by improving the workgroup optimizer	2023-01-24 12:46:28 -06:00
George Hotz	fd760a390a	fix incremental time	2023-01-24 10:19:04 -08:00
George Hotz	a949de873b	reduce 2.0 (#469 ) * reduce 2.0 * works * hacks * DEBUG=3 for shapes * fix types * 0s weren't being folded * cleaner * last_reduce is no longer needed * comments and cleanup	2023-01-23 15:11:13 -08:00
George Hotz	f1196984e6	harmless to intertwine the math and the stores	2023-01-21 09:31:56 -08:00
George Hotz	708215d06b	Typing (#468 ) * we typing * types look good in theory * most tests pass * gpu tests pass * TEST_AST * delete comments * i must have written that bug so many times * bugfix * don't merge the small ones * add f to constants * commits from reduce * don't GCD the mod nodes * broken and a hack IMAGE=3 * group for reduce * fix linter + mypy * move out test ast * insource TENSOR_TYPE_TO_NP_TYPE * does this fix it? * move imports out	2023-01-21 09:09:22 -08:00
George Hotz	0881d504c1	move shapetracker (#466 ) * move shapetracker * shapetracker test * move ast * move a few things * fix print kernel * fix test * symbolic fixups	2023-01-19 09:56:31 -08:00
George Hotz	9245f4650a	indexer changes for master	2023-01-18 18:02:02 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	281b0db773	three from image	2023-01-12 12:26:58 -08:00
George Hotz	9ff6c532eb	Prereqs for IMAGE=1 (#461 ) * contig * move ast, debug prog * add Token * cleanup reduce * exec_ast	2023-01-11 20:18:42 -08:00
George Hotz	fff1f046b0	Simple version of the new GPU backend (#458 ) * newgpu * more to delete * hmm, tests pass with constant folding * fix lint/type * fix constant folding * comment and rerun tests * lazy touchups * fix graph_batchnorm test * smaller transformer to fix OOM * Revert "smaller transformer to fix OOM" This reverts commit `a44ef8edc2`. * no func cache * introspect * touchups * CLASTKernel * ugh, it was lru_cache * codegen * spacing * old gpu still in opencl * typing fix	2023-01-10 19:16:02 -08:00
George Hotz	fad7cba590	move batchnorm to Tensor	2023-01-09 18:00:16 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	d878065ece	Gemm (#416 ) * gemm * off by factor of 5 * 50 GFLOPS * works * 91 gflops * working at 50G * works * iy * 150 GFLOPS * 150 GFLOPS * N=2048 is still fast * threading soon * multithread * pinning * throttling is sad * Align matrices to cacheline width (#361) Co-authored-by: cloud <Cloud11665@gmail.com>	2022-11-06 10:07:28 -08:00
George Hotz	6a8fb53304	move ops.py into lazy.py (#402 ) * move ops.py into lazy.py * fix graph and linter * ugh, didn't add	2022-10-25 13:58:03 -07:00
George Hotz	8e22d5ee67	replace networkx with defaultdict	2022-10-20 19:36:43 -07:00
George Hotz	63f9c55156	really dumb bug	2022-10-20 17:07:47 -07:00
George Hotz	1bec4651b3	fix nonstatic weights	2022-10-20 17:04:14 -07:00
George Hotz	bb288e6938	safe_numpy and warning for broken matmul	2022-10-20 15:40:22 -07:00
George Hotz	50c95c7d9a	add assert to catch issue in attention	2022-10-20 15:13:00 -07:00
George Hotz	26c78ccf7d	remove useless buffer	2022-10-20 14:07:28 -07:00
George Hotz	a18c1f3178	zero out the inputs	2022-10-20 13:46:52 -07:00
George Hotz	ace8db29f8	ReduceSum	2022-10-20 12:48:14 -07:00
George Hotz	c400ee0beb	refactoring thneed (#400 ) * refactoring thneed * continue * minor update * looks like it's working * big refactor * confirm thneed got the right output * code is there but it's broken * works now * always OPTWG, input -> dat * fix type issue	2022-10-20 12:35:59 -07:00
YassineYousfi	ae0f9b17df	openpilot: new models and onnx ops (#401 ) * ngrl stuff * fngrl * fix typo in compile script * workflow dispatch * new models in tests * dont need to up this threshold Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>	2022-10-20 11:49:19 -07:00
George Hotz	ff11c4316b	move get_parameters to optim.py	2022-09-25 13:16:58 -04:00
Jacky Lee	2c01a66265	Reshape dataset from fetch_mnist (#390 )	2022-09-24 21:16:29 -04:00
George Hotz	271446e3eb	set requires_grad to None (#387 ) * set requires_grad to None * some things need gradients * hmm, why was get_parameters filtering	2022-09-21 11:16:02 -04:00
YassineYousfi	2f0f91ba3d	support float16 onnx weights (#384 )	2022-09-15 09:12:18 -04:00
YassineYousfi	1a7bdc51f8	support more onnx ops (#376 ) * broadcast from right to left * add another broadcasted add test * more onnx ops * use float32 range in clip	2022-09-07 15:15:24 -07:00
George Hotz	0516359af8	fix stupid OPENCL=1 OOM	2022-09-06 14:29:23 -07:00
George Hotz	4dadd95e3c	fix tests hopefully, more stable diffusion	2022-09-03 10:38:31 -07:00
George Hotz	c01a8c5c2d	stable diffusion start	2022-09-03 10:08:42 -07:00
George Hotz	a3fc64a585	fix batchnorm folding in openpilot compile	2022-08-31 13:04:49 -07:00
George Hotz	dc7af8c3ac	thneed run float32	2022-08-28 11:03:35 -07:00
George Hotz	b132de677d	tinygrad.nn (#367 ) * tinygrad.nn * flake8 * working on pylint * more pylint * more pylint * pylint passes * networkx * mypy can't infer that type * junk	2022-08-18 07:41:00 -07:00
George Hotz	f76d41812b	prune graph	2022-07-17 15:38:43 -07:00
George Hotz	eda6f071b2	default opt level 2	2022-07-17 14:54:40 -07:00
George Hotz	73b0471b25	join expands	2022-07-17 13:42:05 -07:00
George Hotz	d04b274cd2	noop removal can replace with reshape	2022-07-16 08:32:42 -07:00
George Hotz	2720ef49ca	extra and test and tuple	2022-07-07 10:01:33 -07:00
George Hotz	81b73f97a3	Optiimzation (#355 ) * constant folding into kernels * that opt worth it? * fix mypy * ast one kernel * save 2 lines in conv kernel * debug print kernel count * cl debugging * early realize inputs * refactor Device	2022-07-04 08:58:57 -07:00
George Hotz	7276f8d6bf	improve constant folding, detach before moving tensor	2022-07-02 15:29:40 -07:00
George Hotz	8cf1aed0f4	don't track_running_stats, parameters must require_grad	2022-07-02 14:38:45 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	83d50e2687	move to extra.onnx	2022-06-21 19:43:44 -07:00
George Hotz	9b27ba650b	load new torch files	2022-06-07 10:06:48 -07:00
George Hotz	233c71a7ba	support requires_grad	2022-06-06 07:47:31 -07:00
George Hotz	d8d19ed468	wikimedia wasn't returning 200	2022-01-15 19:09:29 -08:00
George Hotz	e28cdfb0cf	clean up resnet	2021-11-30 16:14:54 -05:00
George Hotz	58ed46963e	fix broadcastdot	2021-11-29 18:54:57 -05:00
George Hotz	dca076dbf1	remove dumb nn ops	2021-11-29 18:05:31 -05:00
George Hotz	30eb3afbe1	add bias term to transformer	2021-11-29 12:45:27 -05:00
George Hotz	e2a8961a18	less lines, fix bug	2021-11-17 12:52:17 -08:00
George Hotz	ba28761894	move yolo into examples/yolo	2021-10-30 19:46:00 -07:00
George Hotz	63f50cff45	move back again	2021-10-30 16:13:29 -07:00
Evan Mays	285621aeda	Cherry backprop for conv2d (#281 ) * quick math: 0 + x = x. * gradient w.r.t. x using cherry for conv * gradient w.r.t. w for conv on cherry but doing vector dot products * small optimization * [cherry] optimize conv backpass for large channel count * get rid of numpy einsum	2021-10-30 16:12:19 -07:00
George Hotz	3d646272d6	move back	2021-10-30 16:12:12 -07:00
George Hotz	ac8afd24fa	refactor accel	2021-10-30 16:10:59 -07:00
Guglielmo Camporese	2b7589db64	Added ResNet-{18, 34, 50, 101, 152} (#271 ) * added resnets * fix minor * fix minor * resnet in models * added resnet test * added resnet train test * added linear, conv2d nn tests * fix minor in extra/training * resnet in models * fix minor * fix tolerance for linear in nn test * fix eval, this causes cpu and gpu UT failing * revert transformer test * fix minor for CPU test * improved model get_params for sequential layer * fix minor for params counting * commented broken ops tests * improved train for resnet	2021-06-21 09:37:24 -07:00
George Hotz	89798d2f43	some flags	2021-06-19 11:46:31 -07:00
George Hotz	d81eae8288	debug cherry crash	2021-06-19 11:41:20 -07:00
George Hotz	d3f169b267	move good models to models, add a training step test	2021-06-19 11:24:15 -07:00
George Hotz	b48d4bad2e	clean up print spam	2021-06-19 10:31:04 -07:00
George Hotz	027535d0b5	microcoded matmul	2021-06-17 21:03:08 -07:00
George Hotz	026e2ae6a7	three registers and a zero command	2021-06-17 17:09:18 -07:00
George Hotz	2e71ae33f6	max op works	2021-06-17 17:01:21 -07:00
George Hotz	9e12c1bbba	cherry binop	2021-06-17 16:50:40 -07:00
George Hotz	fcdabea880	training mnist with cherry ops	2021-06-17 16:45:35 -07:00
George Hotz	2affd226b3	speed up sum	2021-06-17 16:38:34 -07:00
George Hotz	e8eb7d1b7e	max op	2021-06-17 16:20:56 -07:00
George Hotz	c1d469d440	sum op	2021-06-17 16:19:35 -07:00
George Hotz	b1000d866e	readme, plus reduce ops	2021-06-16 11:21:06 -07:00
George Hotz	ff3fdc58e5	risk -> cherry	2021-06-16 09:59:48 -07:00
George Hotz	2f91c012eb	build note	2021-06-15 22:41:41 -07:00
George Hotz	4850d6eb43	update todo	2021-06-15 10:22:39 -07:00
George Hotz	4e1edb3692	have tinygrad log the loads	2021-06-14 18:35:14 -07:00
George Hotz	93f2e9769d	little note	2021-06-14 15:49:41 -07:00
George Hotz	a89d12d735	wow, way faster	2021-06-10 17:11:39 -07:00
George Hotz	10b1306525	binops	2021-06-10 16:52:37 -07:00
George Hotz	4535d39baa	comments and pow	2021-06-10 09:03:40 -07:00

... 3 4 5 6 7 ...

478 Commits (deepcrayon)