tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
George Hotz	453e437598	move stuff in the linearizer (#1726 ) * move stuff in linearizer * move stuff in linearizer * minor * fix opts import	2023-08-31 14:42:09 -07:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
chenyu	f00325e77d	ops_metal newCommandQueueWithMaxCommandBufferCount_(1024) (#1664 )	2023-08-24 15:42:00 -07:00
nimlgen	bd111411bf	init allocator for compiled backends (#1467 ) * init allocator for compiled backends * Update ops_webgpu.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-17 10:33:32 -07:00
chenyu	11dd9b1741	symbolic codegen and exec (#1552 ) * symbolic codegen and exec * fix and add test * no sketchy * merge_dicts type * dtypes._arg_int32	2023-08-16 14:43:41 -07:00
Diogo	d17ecccd78	Torch/LLVM/arm F64 support (#1551 )	2023-08-15 21:21:08 -04:00
Steven Anderson	93a36c3659	Arm (#1421 ) * testing new memops * better debugging * testing padded conv * branching with load * refactoring a bit * first try * fixing bugs * fixing some * eq * eq2 * do not use x's * working * fixing imm * getting things working * refactor * pow not working * working except one * refactor: one store mem * refactor: global load * refactor: imm * refactor: cleaning * fixing big offsets * refactor with ci * try ci * typo * another typo * ubuntu default * forgot git * do i need git? * missing packages * adding python-dev * with cache? * buildx action * buildx name issue? * maybe now? * python3 * newline warning * maybe now * i actually need this * ci should work now * improved caching * fixing cache * maybe now it will cache * this * testing cache * trying again * load * missing platform * caching gha * testing cache * full testing * typo * now? * why * adding checkout back * bad formatting * fixing convention issues * supporting python * adding CI flag * testing all * better comments * adding debugging * takes 12x longer * does it output progress now? * ignore models for speed * fixing merge * excluding conv_transpose2d * only 2 test cuz is to slow * another approach * let's see * faster duh * my bad * T_T * typo * sup * with output? * comment test * comment test * comment test * :? * no comment * with cache * back to normal * testing that ci works * back to passing * trying again * does it create another entry * does it create another entry? * build local * hey * Revert "excluding conv_transpose2d" This reverts commit `cc7348de03`. * does it cache if done before? * does it cache? * done * adding test ops * bad formatting * no need for this * working static mem * sum 1d * add ndim * better reg import * fix stack * back to np * working except for softmax * 5 failing * no pogress * remove keystone * remove keystone * testops passing * cleanups * more cleanup * typo * ci * ci2 * cond import * ci3 * ci4 * ci4 * ci5 * ci5 * ci6 * aligment * test all * correct test * err read_unmapped * passing test * ignore for speed * ignore for speed * ci7 * cleanup * remove docker * fixing merge * fixing bugs * add skipload for const ops * comments * First merge to master: Renderer * fix emulation * passing all tests arm64 * cleaning * fix handcoded binary * cleaning * fix errs * fix runtime arg binary * clean git diff * fix and clean * fixing metal test * cleaning * fix metal test * ci ~8 min * fix pylint and clang * cache the files in ops_clang --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-14 19:29:30 -07:00
George Hotz	84c430355e	fix backends for new style (#1443 ) * fix backends for new style * fix method cache * fix fakeless * llvm blacklist * fix kernel optimizer	2023-08-05 11:07:04 -07:00
George Hotz	f4218b709f	Revert "Improve Metal runtime command buffer handling (#1335 )" (#1397 ) This reverts commit `bd54105b6b`.	2023-08-01 12:10:20 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
Anthony Zboralski	bd54105b6b	Improve Metal runtime command buffer handling (#1335 ) * Improve Metal runtime command buffer handling * Remove obsolete mtl_buffers_in_flight list from _METAL class * remove unused import in ops_metal.py * Refactor: Use `self.dispatch_group` over `METAL.dispatch_group` Changes `libdispatch.dispatch_group_enter(METAL.dispatch_group)` to `libdispatch.dispatch_group_enter(self.dispatch_group)`	2023-07-26 15:45:40 -07:00
George Hotz	9dffc9ba23	Use nevergrad to optimize kernels (try 2) (#1301 ) * nevergrad try 2 * touchups * no ones * opt fixup * cleanups * touchup * make new optimizer file	2023-07-20 16:46:45 -07:00
Roelof van Dijk	8f2e2f5ee2	style: else-after-return (#1216 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-07-12 10:26:38 -07:00
Reza Rezvan	535224ac20	Remove float64 (#1101 ) * Refactor: Remove float64 * Refactor: Remove unused imports * Refactor: Remove float64 * Refactor: Remove float64 * Refactor: Exclude float64 onnx backend * Add: Skip jacobian and gradcheck tests;	2023-07-04 08:40:51 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
George Hotz	ba56ee6020	RDNA assembly backend ($1000 bounty) (#787 ) * Revert "Revert "ops rdna"" This reverts commit `0400315078`. * Revert "Revert "writing 2"" This reverts commit `325a3bf2cf`. * no dump * 2x 2 * simple asm * local size * sub * lil work * support args != 3 * assembler work * generate that * ptx assembler * begin index renderer * max * ptx loops * gemms work * valid works * asm working a bit more * close * passing all ops tests * ptx is a codegen only, not a backend * ptx * float16 support * rdna goes here * install types * make amd disassemble * ansilen for pretty print * fix ptx log2/exp2 * assemblyinstruction * new asm * working gemm * fix cmp * more passing * mod * ptx works again * rdan3 add works * log exp * sin is sin 2pi * fix types * progress * loops work * rdna xyz * better addressing * cleanups * handle exception in early process * div support * rdna float4 * locals work * fix neg index * cast * smaller diff * yaml * import only if selected * fromimport * types * this all needs rewriting * a few more	2023-06-16 09:33:18 -07:00
Diogo	0629791cbd	F64 support (#976 ) * initial commit * added osx check for opencl * added llvm f64 conversions * typo in llvmir * more tests and modified unsupported error * fixed linting error * added pragma fp64 * simplified exclusion for OSX * fixed device check and also added it to cast func * added ifdef check for fp16 in ops_gpu * Revert "added ifdef check for fp16 in ops_gpu" This reverts commit `92de754d48`. * f64 prekernel signature match f16 * moved condition to buffer init	2023-06-13 21:31:31 -07:00
Diogo	0dab8edc97	support Int64 type in cstyle gen (#860 ) * added metal int64 and some simple tests * removed bool return type def * typo in test * also missing in clang and gpu runtimes * switched order for opencl * increased atol and removed new line in kernel prefix	2023-05-30 16:04:46 -07:00
George Hotz	23f88fb026	synchronize for honest speed compare	2023-03-24 10:24:27 -07:00
George Hotz	5495c7d64e	linearizer! (#714 ) * linearizer outputs something * working ish * cstyle codegen * clang mostly works * fix load valid * fix numberless loop * fancy gen * working * fix enet compiler * cleanups * float4 upcasting * less lines * supports_float4 * constant folding * mulacc * internet tests flaky in CI * 90% image support * fix image generic * bugs exposed with shapetracker and single view * new llvm * use vload, remove OLD * that's really poorly done * ending up being more lines	2023-03-19 23:43:49 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
George Hotz	54f499b623	Move rawbuffer (#697 ) * move GlobalCounters to helpers * that's not part of the public api * move InterpretedBuffer * remove fromCPU from devicebuffer	2023-03-13 22:30:36 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Cyril Roumégous	3f08613a2a	apply flake8 E203 rule (#684 )	2023-03-11 11:35:16 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	0b03216cc3	losing lines (#678 ) * losing lines * FLIP -> STRIDE * shapetracker refactor	2023-03-10 21:57:05 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	1a039306d2	good changes from llama branch (#671 ) * good changes from llama * transpose behavior changed	2023-03-09 20:51:22 -08:00
Cyril Roumégous	c10131ddf5	reduce number of lines (#645 )	2023-03-05 15:42:32 -08:00
George Hotz	b1ba78ac38	move applegpu disassembler	2023-03-05 11:21:12 -08:00
George Hotz	85f69b5489	metal needs the Cocoa	2023-03-03 23:22:15 -08:00
George Hotz	28a6ada4ce	line reduction in metal	2023-03-03 23:14:40 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00

34 Commits (5b15a972b574d23052536b0be8e959a1786c210f)