tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
James Roberts	0d405fd5bc	Parallelize CI tests (#535 )	2023-02-06 15:27:44 -06:00
George Hotz	90529d3750	tests are 20% faster (#529 ) * pytorch CPU * no cache, it's slower * pytorch cpu for real * remove double onnx	2023-02-06 09:56:14 -06:00
George Hotz	039de1b332	oops, pytest is for testing	2023-02-06 09:30:12 -06:00
George Hotz	6eb0e6a650	shuffle deps: always tqdm, make linting category	2023-02-06 09:27:01 -06:00
Jacky Lee	ad4f6aa2cf	Add test for quick_gelu (#526 ) * Add test for quick_gelu * Bump PyTorch version for approximate	2023-02-03 20:01:39 -08:00
George Hotz	cd97b036cc	A Triton backend for tinygrad (#470 ) * triton can add * print stuff from triton * write out file * ops triton working * reduce ops * sort of works * Triton bugfixes & implementation of remaining ops (#490) * padding * support pow, max, relu, gt0 * allocate return buffer * Fix reduce * Add tests for power op * Fix triton illegal memory accesses and memory leak (#512) * Fix mypy issue * Add triton to setup.py * Replace torch with pycuda * Use one cuda stream for data transfer and kernels * Remove triton submodule * Fix memory leak by using weakrefs for caching * Fix memory access by adding valid as mask for load * Fix invalid kernel launches by flattening the grid (#515) --------- Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>	2023-02-01 11:53:57 -08:00
George Hotz	bd8a5c2ced	Simple CUDA Runtime (#480 ) * factor out opencl runtime * don't use CL outside the runtime * cuda runtime adds * final_dimension * tests pass with CUDA backend * more cuda * cuda simpler * retain old functionality * linter and typing * move globalcounters out of runtimes * oops, GlobalCounters in cuda * MAX_OUTPUT_SHAPE=3 is fine for CUDA	2023-01-27 16:26:24 -08:00
Jacky Lee	026ba78526	Add commit hooks (#478 ) * Add pre-commit hook * We need ret * Fix some type definitions	2023-01-26 22:24:31 -08:00
George Hotz	bfd4f4e35c	testdocker	2023-01-09 12:41:52 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	92ed87b0a5	bump version to 0.4.0	2022-11-08 08:44:42 -08:00
George Hotz	b132de677d	tinygrad.nn (#367 ) * tinygrad.nn * flake8 * working on pylint * more pylint * more pylint * pylint passes * networkx * mypy can't infer that type * junk	2022-08-18 07:41:00 -07:00
Nicklas Boman	64d986bc8b	add mypy to ci testing (#353 )	2022-07-03 15:11:35 -07:00
George Hotz	0d82cfd587	huh, torch 1.12 broke it. remove unused requirements.txt and pin torch 1.11	2022-07-02 23:07:59 -07:00
George Hotz	a710b3a210	it's a real test now	2022-06-11 11:33:33 -07:00
George Hotz	8440dbfa5d	support inputs	2022-06-11 11:21:45 -07:00
George Hotz	082089d1c7	install requires pillow	2021-10-30 16:00:33 -07:00
Liam	bcf1518309	All devices are equal! (#196 ) * Update all devices to be tested ANE, CPU and OCL all now support all tests. However tests are not currently passing on GPU and I cannot test on CPU. Failing GPU test are not an issue caused by this update. Tests have not been passing due to a missing "six" required installation. OpenCL Tests have not been run since commit: `1a1c63a08b` devices have 3 types and are handle by a new DeviceTypes enum. (The goal is to revert to Tensor.<type>, but this current setup allows for keyword argument defaults: `device=DeviceType.CPU`) All references to Tensor.GPU/CPU/ANE as been converted to the corresponding `DeviceTypes` enum. Refactor of the conversion code to allow for any device to any device conversion. * Add six dependency in requirements.txt * Resolve failure to run tests Move six into gpu required installs. Remove six from standard installation. * Remove repeated data conversion * Refactor method names Also reduce code with .to and .to_ * Dynamic device handlers * Refactor DeviceTypes -> Device * Add mem copy profiling back * test_backward_pass_diamond_model passing * Resolve Sum issue on GPU * Revert batchnorm2d tests * Update README with upadated API * ANE testing with * Last minute line gains	2020-12-15 23:44:08 -08:00
Liam	34b38dd4d0	Extra install requirements. (#164 ) * Testing install requirements * GPU install requirements	2020-12-09 02:22:47 -08:00
George Hotz	06504a5824	bump version	2020-11-08 09:34:07 -08:00
Marcel Bischoff	d24363f421	Update setup.py (#49 ) I think `:=` in tinygrad/test/test_mnist.py actually needs 3.8	2020-11-02 18:09:31 -08:00
George Hotz	0b68c08de0	literally just bump version for picture on pypi	2020-10-27 08:14:22 -07:00
George Hotz	6b5982b6b3	push pypi	2020-10-27 08:13:15 -07:00
George Hotz	43591a1e71	make the example simpler	2020-10-26 09:19:20 -07:00
George Hotz	64bd4f7936	lol, it's not 1.0	2020-10-26 09:11:32 -07:00
Göktuğ Karakaşlı	8d80726207	two spaces	2020-10-26 18:54:55 +03:00
Göktuğ Karakaşlı	cc9bd45b44	add setup.py and change imports to relative	2020-10-26 18:19:50 +03:00

1 2

77 Commits (deepcrayon)