tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
Jeff Moe	661dcc5ed0	Reformat, uh, everything, with black	2023-12-04 22:01:04 -07:00
qazal	ab2d4d8d29	Fix cl import in the copy_speed test and cifar example (#2586 ) * fix CL import * update test to only run on GPU * update hlb_cifar too	2023-12-03 09:22:07 -08:00
Oleg Rybalko	5e87083783	Whisper + LLAMA + VITS (#2332 ) * feat: working voice 2 text using whisper * feat: added llama generation * feat: vits init * feat: more accurate voice conversion * feat: support for tts and working pipeline for the first pass * fix: linter checks * refactored vits initialization and inference, added mmts-tts support * fixed process sync and now we can have an infinite conversation * reuse output stream to remove overhead of creating a new one each time * added pre-prompt configuration with yaml files * adjusted code to merge PR which changed whisper * optimized whisper, now it's blazing fast and also reduced number of lines * added better debug printing * use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens * fixed hf convert and now it's working with tinyllama * added tinyllama config * refactored code and made it work with all llama models * prettier order * prettier order * fixed suffix for tinyllama and refactored convert_from_hf * added missing parameters * fixed stream release and added missing params * jitted dp and encoder * jitted flow forward * removed re-init of espeak on each call to save up time * jitted generator forward for blazing fast tts * added contextmanager for displaying a chat log * removed whitespace for pylint * updated code to support latest fetch func * wait for llama eos token and pass params from cli to llama * listen for not fixed amount of time * refactored code a bit * removed thresholding and now the output streams directly to whisper * tokenize llama output for vits batch size to work and stream each sentence to a speaker * changed speaker * whisper is now printing on the same line * don't trigger llama on whisper output in parens * added tinyllama chat model * adjusted code to work with tinyllama chat model * removed unused cli arg * autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer * fixed issue with long sentences by chunking them * support for multiline llama output * prettified log output * adjusted sentence length * remove quote from response to avoid funny tts * fixed prompts * added missing parameter	2023-12-02 15:03:46 -08:00
chenyu	05a5357dd9	fix handcode_resnet50_opt.py (#2558 )	2023-12-01 20:51:21 -05:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
Davi Silva	ddeec24fa8	Cleanup & fix llama.py (#2524 ) * docs, cleanup crap * comma AI * fix 70B * this is why lexical scope exists	2023-11-30 16:00:17 -05:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
chenyu	a739c6646e	fp16 in gpt2 attention (#2491 ) * fp16 in gpt2 attention * HALF	2023-11-28 19:27:03 -05:00
chenyu	7f9a4c1285	fp16 and noshow flags for gpt2 (#2470 )	2023-11-27 16:23:03 -05:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
Akshay Kashyap	a031afb2f6	Update display_name in resnet50 example (#2454 )	2023-11-26 16:07:36 -08:00
George Hotz	7170a9a057	coder.py can write and run code (#2439 ) * wip mistral * coder * touchups * cleanups * mistral cleanups * clean up cache create * download the weights, fix tests * fix llama loading * global fixup * clean up all * move llama model * cleanups * Revert "cleanups" This reverts commit `a71c5d59eb`. * fine, leave it	2023-11-25 12:27:54 -08:00
Davi Silva	df41a57e09	Fix: missing n_kv_heads for smaller models from huggingface (#2438 ) * fix: missing n_kv_heads for smaller models from huggingface * a lil golfing	2023-11-25 10:29:04 -08:00
George Hotz	96c12fdeab	multibatch gpt2 (#2432 ) * support multibatch gpt-2 * multi output * no default JIT in CI	2023-11-24 18:10:10 -08:00
Francis Lata	7169de57e2	Update VITS to use fetch helper (#2422 ) * use fetch helper on vits * remove duplicate weight loading	2023-11-24 08:50:03 -08:00
George Hotz	8f89e21fca	torch and numpy don't share ops anymore (#2412 ) * torch and numpy don't share ops anymore * that should be filtered out elsewhere * still const * graph + enet example cleanup * hmm, we do still need it because of symbolic	2023-11-23 16:58:10 -08:00
George Hotz	5bb720a777	Cocoa is no longer used	2023-11-23 14:31:21 -08:00
George Hotz	095e2ced61	add name support to fetch (#2407 ) * add name support * use fetch in gpt2 * remove requests from main lib, networkx also optional * umm, keep that assert * updates to fetch * i love the walrus so much * stop bundling mnist with tinygrad * err, https * download cache names * add DOWNLOAD_CACHE_VERSION * need env. * ugh, wrong path * replace get_child	2023-11-23 14:16:17 -08:00
Francis Lata	6d672785db	Update Whisper to use fetch helper (#2401 ) * update whisper to use new fetch helper * simplify file opening * update name * update key name to "downloads-cache"	2023-11-23 12:59:59 -08:00
George Hotz	2dec86970a	hotfix: default remains gen 1 llama	2023-11-21 14:43:02 -08:00
mmmkkaaayy	7f0cc4a4e8	whisper: support audio >30s (#2378 ) * whisper: support audio >30s * make prompt indexing consistent with reference repo * fix online	2023-11-21 14:37:51 -08:00
Oleg Rybalko	7220f5c9fc	fixed hf convert and now it's working with tinyllama (#2374 ) * fixed hf convert and now it's working with tinyllama * added tinyllama config * refactored code and made it work with all llama models * prettier order * prettier order * fixed suffix for tinyllama and refactored convert_from_hf * dynamically update help if MODEL_PARAMS changes and default size is the 1st	2023-11-21 14:36:52 -08:00
chenyu	e9847be790	remove whisper +1-1 hack (#2360 ) * remove whisper +1-1 hack * Revert "remove whisper +1-1 hack" This reverts commit `5db3800f09`. * update whisper tests * comment context	2023-11-19 17:56:36 -05:00
George Hotz	c8c5212dce	a lil more beautiful_mnist	2023-11-17 19:53:06 -08:00
George Hotz	c7b38b324b	A beautiful MNIST training example (#2272 ) * beautiful mnist * beautiful mnist example * from tinygrad import Tensor * more beautiful * the jit is super core tinygrad * globalcounters reset on jit run * symlinks and exclude * beautiful_cartpole * evaluate is it's own function * no symlinks * more beautiful * jit reset for double speed * type hinting for JIT * beautiful_mnist gets 98% * beautiful_mnist < 4s with BEAM=2 * better cartpole * use actor critic * zero_grad got lost * delete double relu * stable cartpole with PPO * beautiful_cartpole is more beautiful * REPLAY_BUFFER * beautiful stuff typechecks * None support in shape * hp tuning	2023-11-17 19:42:43 -08:00
Friedrich Carl Eichenroth	75676ab8e1	Profiling-helper (#2321 ) * change profiler * remove unused imports * remove unused imports * change lazybuffer references * remove unused line * remove unused import * remove unused stuff * add types * typing * typing * typing * trigger actions * -1 loc * fixup * trigger actions * revert lazy typing changes * WIP profiler helper * replace old start & stop profiler * fixup * linting * Update llama.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-16 14:15:56 -08:00
mmmkkaaayy	8235da11dd	whisper: support batch inference, add librispeech WER test (#2074 ) * whisper: support batch inference, add librispeech WER test, add kv caching and JIT * remove JIT_SUPPORTED_DEVICE --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-16 13:50:08 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
George Hotz	70a65c201e	JIT support in Interpreted (#2314 ) * factor that out * jit is supported everywhere * fix some tests * there's no jit supported device, the jit is everywhere * fix test uops	2023-11-15 11:13:38 -08:00
George Hotz	01f8781c26	fix CI (#2300 ) * might work * might work 2 * might work 3 * sneak that in to llama too * pin them all	2023-11-14 11:02:59 -08:00
George Hotz	0cbf6c1811	move things, clean up extra (#2292 ) * move things * idk why pylint needs that now * delete unused	2023-11-13 20:18:40 -08:00
chenyu	a72b370066	llama take int and convert to Variable internally (#2284 )	2023-11-12 17:11:37 -05:00
chenyu	5ef8d682e3	clean up attentions in stable diffusion (#2275 )	2023-11-11 14:25:36 -05:00
chenyu	453f48ce02	pad None means (0,0) (#2273 )	2023-11-11 09:50:26 -08:00
chenyu	880e693207	fix llama n_kv_heads in kvcache (#2267 ) * fix llama n_kv_heads in kvcache * trigger ci	2023-11-10 21:44:39 -05:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
wozeparrot	4c44d1344b	feat: remove cache_id (#2236 )	2023-11-08 08:09:21 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
Ahmed Harmouche	265304e7fd	Stable diffusion WebGPU port (#1370 ) * WIP: Stable diffusion WebGPU port * Load whole model: split safetensor to avoid Chrome allocation limit * Gitignore .DS_Store, remove debug print * Clip tokenizer in JS * WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS * e2e stable diffusion flow * Create initial random latent tensor in JS * SD working e2e * Log if some weights were not loaded properly * Remove latent_tensor.npy used for debugging * Cleanup, remove useless logs * Improve UI * Add progress bar * Remove .npy files used for debugging * Add clip tokenizer as external dependency * Remove alphas_cumprod.js and load it from safetensors * Refactor * Simplify a lot * Dedup base when limiting elementwise merge (webgpu) * Add return type to safe_load_metadata * Do not allow run when webgpu is not supported * Add progress bar, refactor, fix special names * Add option to chose from local vs huggingface weights * lowercase tinygrad :) * fp16 model dl, decompression client side * Cache f16 model in browser, better progress * Cache miss recovery --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-03 18:29:16 -07:00
George Hotz	7103b716c4	merge kernel and optimizer (#2200 ) * merge kernel and optimizer * linearize is reentrant * move global/local size * clean up linearizer copy * remove unneeded lin copies * stop linearizing twice * oops, that should be None	2023-11-01 15:20:01 -07:00
George Hotz	b245f1307e	add exp2 (#2192 )	2023-10-31 17:48:42 -07:00
Akshay Kashyap	018bd29e37	Enable Multi-Output Export (#2179 ) * Enable Multi-Output Export * Add test * Update examples and lint * fix padding * test ops * dummy commit to rerun test * revert cuda lint * Enforce tuple/list of tensors * subscripted generics * put back webgpu test * Re-enable WebGPU Efficientnet test	2023-10-30 18:42:26 -07:00
chenyu	8548b20b23	fix codellama params and repeat_kv (#2181 )	2023-10-30 10:16:26 -07:00
George Hotz	e0201922e3	Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142 ) * stable diffusion < 324ms * revert swap action * fix tests due to more sum splitting * REDUCEOP_SPLIT_THRESHOLD env var * added from unaligned np test (#2134) * align cpu buffer before copy into cl buffer (#2135) * remove shelve from handcode_resnet50_opt.py (#2139) * Add dictionary keys to reduce db size (#2131) * work * ignore beam cache * dictionary keys are generic * minor db cleanups * fix baseline and extract dataset * fix training * log likelihood * more lin to feats * sts * training policynet * net sort of works * dedup * refactor, stupid new actions * fix uops deduping * BEAM_ESTIMATE --------- Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>	2023-10-27 10:53:06 -10:00
will	bc0829b677	Fix llama json loading (#2160 )	2023-10-27 10:21:56 -10:00
nimlgen	8d41b3eb3f	beam=16 makes gpt2 gpu-time < 5ms on 3090 (#2154 )	2023-10-27 10:21:27 -10:00
wozeparrot	c29653605e	hip multigpu training (#1878 ) * feat: move to hip * feat: special path for RawBufferTransfer * feat: initial rawbuffertransfer * feat: hip ipc * feat: working hip ipc * feat: need to base device without args * feat: close mem handle * feat: modified test * feat: more multihip stuff * clean: cleanup * feat: cleaner * feat: don't crash * feat: test more * clean: way cleaner hip wrapper * feat: barrier * feat: barrier * feat: this breaks stuff * feat: we can use empty here * feat: maybe fix tests * feat: maybe fix tests again? * fix: probably fix tests * feat: no waiting here * feat: wait here * feat: much larger test * feat: need to sync here * feat: make this async * feat: no waiting! * feat: cut here * feat: sync copy * feat: random imports * feat: much cleaner world * feat: restore this * feat: restore this * clean: cleanup * feat: set this	2023-10-24 17:35:53 -04:00
nimlgen	e21bf776c8	fix debug=1 llama/gpt2 timings (#2143 )	2023-10-24 15:45:00 -04:00
chenyu	d5e2fdea22	remove shelve from handcode_resnet50_opt.py (#2139 )	2023-10-24 10:37:30 -04:00
George Hotz	6dc8eb5bfd	universal disk cache (#2130 ) * caching infra for tinygrad * nons tr key * fix linter * no shelve in beam search * beam search caching * check tensor cores with beam too * pretty print * LATEBEAM in stable diffusion	2023-10-22 10:56:57 -07:00

1 2 3 4 5 ...

479 Commits (deepcrayon)