1
0
Fork 0
Commit Graph

479 Commits (deepcrayon)

Author SHA1 Message Date
Jeff Moe 661dcc5ed0 Reformat, uh, everything, with black 2023-12-04 22:01:04 -07:00
qazal ab2d4d8d29
Fix cl import in the copy_speed test and cifar example (#2586)
* fix CL import

* update test to only run on GPU

* update hlb_cifar too
2023-12-03 09:22:07 -08:00
Oleg Rybalko 5e87083783
Whisper + LLAMA + VITS (#2332)
* feat: working voice 2 text using whisper

* feat: added llama generation

* feat: vits init

* feat: more accurate voice conversion

* feat: support for tts and working pipeline for the first pass

* fix: linter checks

* refactored vits initialization and inference, added mmts-tts support

* fixed process sync and now we can have an infinite conversation

* reuse output stream to remove overhead of creating a new one each time

* added pre-prompt configuration with yaml files

* adjusted code to merge PR which changed whisper

* optimized whisper, now it's blazing fast and also reduced number of lines

* added better debug printing

* use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens

* fixed hf convert and now it's working with tinyllama

* added tinyllama config

* refactored code and made it work with all llama models

* prettier order

* prettier order

* fixed suffix for tinyllama and refactored convert_from_hf

* added missing parameters

* fixed stream release and added missing params

* jitted dp and encoder

* jitted flow forward

* removed re-init of espeak on each call to save up time

* jitted generator forward for blazing fast tts

* added contextmanager for displaying a chat log

* removed whitespace for pylint

* updated code to support latest fetch func

* wait for llama eos token and pass params from cli to llama

* listen for not fixed amount of time

* refactored code a bit

* removed thresholding and now the output streams directly to whisper

* tokenize llama output for vits batch size to work and stream each sentence to a speaker

* changed speaker

* whisper is now printing on the same line

* don't trigger llama on whisper output in parens

* added tinyllama chat model

* adjusted code to work with tinyllama chat model

* removed unused cli arg

* autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer

* fixed issue with long sentences by chunking them

* support for multiline llama output

* prettified log output

* adjusted sentence length

* remove quote from response to avoid funny tts

* fixed prompts

* added missing parameter
2023-12-02 15:03:46 -08:00
chenyu 05a5357dd9
fix handcode_resnet50_opt.py (#2558) 2023-12-01 20:51:21 -05:00
George Hotz 2c363b5f0b
new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
Davi Silva ddeec24fa8
Cleanup & fix llama.py (#2524)
* docs, cleanup crap

* comma AI

* fix 70B

* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
George Hotz d87a246439
move to new cached fetch (#2493)
* move to new cached fetch

* extra.utils is over

* loads

* bump download cache

* bump timeout
2023-11-28 17:36:55 -08:00
chenyu a739c6646e
fp16 in gpt2 attention (#2491)
* fp16 in gpt2 attention

* HALF
2023-11-28 19:27:03 -05:00
chenyu 7f9a4c1285
fp16 and noshow flags for gpt2 (#2470) 2023-11-27 16:23:03 -05:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
Akshay Kashyap a031afb2f6
Update display_name in resnet50 example (#2454) 2023-11-26 16:07:36 -08:00
George Hotz 7170a9a057
coder.py can write and run code (#2439)
* wip mistral

* coder

* touchups

* cleanups

* mistral cleanups

* clean up cache create

* download the weights, fix tests

* fix llama loading

* global fixup

* clean up all

* move llama model

* cleanups

* Revert "cleanups"

This reverts commit a71c5d59eb.

* fine, leave it
2023-11-25 12:27:54 -08:00
Davi Silva df41a57e09
Fix: missing n_kv_heads for smaller models from huggingface (#2438)
* fix: missing n_kv_heads for smaller models from huggingface

* a lil golfing
2023-11-25 10:29:04 -08:00
George Hotz 96c12fdeab
multibatch gpt2 (#2432)
* support multibatch gpt-2

* multi output

* no default JIT in CI
2023-11-24 18:10:10 -08:00
Francis Lata 7169de57e2
Update VITS to use fetch helper (#2422)
* use fetch helper on vits

* remove duplicate weight loading
2023-11-24 08:50:03 -08:00
George Hotz 8f89e21fca
torch and numpy don't share ops anymore (#2412)
* torch and numpy don't share ops anymore

* that should be filtered out elsewhere

* still const

* graph + enet example cleanup

* hmm, we do still need it because of symbolic
2023-11-23 16:58:10 -08:00
George Hotz 5bb720a777 Cocoa is no longer used 2023-11-23 14:31:21 -08:00
George Hotz 095e2ced61
add name support to fetch (#2407)
* add name support

* use fetch in gpt2

* remove requests from main lib, networkx also optional

* umm, keep that assert

* updates to fetch

* i love the walrus so much

* stop bundling mnist with tinygrad

* err, https

* download cache names

* add DOWNLOAD_CACHE_VERSION

* need env.

* ugh, wrong path

* replace get_child
2023-11-23 14:16:17 -08:00
Francis Lata 6d672785db
Update Whisper to use fetch helper (#2401)
* update whisper to use new fetch helper

* simplify file opening

* update name

* update key name to "downloads-cache"
2023-11-23 12:59:59 -08:00
George Hotz 2dec86970a hotfix: default remains gen 1 llama 2023-11-21 14:43:02 -08:00
mmmkkaaayy 7f0cc4a4e8
whisper: support audio >30s (#2378)
* whisper: support audio >30s

* make prompt indexing consistent with reference repo

* fix online
2023-11-21 14:37:51 -08:00
Oleg Rybalko 7220f5c9fc
fixed hf convert and now it's working with tinyllama (#2374)
* fixed hf convert and now it's working with tinyllama

* added tinyllama config

* refactored code and made it work with all llama models

* prettier order

* prettier order

* fixed suffix for tinyllama and refactored convert_from_hf

* dynamically update help if MODEL_PARAMS changes and default size is the 1st
2023-11-21 14:36:52 -08:00
chenyu e9847be790
remove whisper +1-1 hack (#2360)
* remove whisper +1-1 hack

* Revert "remove whisper +1-1 hack"

This reverts commit 5db3800f09.

* update whisper tests

* comment context
2023-11-19 17:56:36 -05:00
George Hotz c8c5212dce a lil more beautiful_mnist 2023-11-17 19:53:06 -08:00
George Hotz c7b38b324b
A beautiful MNIST training example (#2272)
* beautiful mnist

* beautiful mnist example

* from tinygrad import Tensor

* more beautiful

* the jit is super core tinygrad

* globalcounters reset on jit run

* symlinks and exclude

* beautiful_cartpole

* evaluate is it's own function

* no symlinks

* more beautiful

* jit reset for double speed

* type hinting for JIT

* beautiful_mnist gets 98%

* beautiful_mnist < 4s with BEAM=2

* better cartpole

* use actor critic

* zero_grad got lost

* delete double relu

* stable cartpole with PPO

* beautiful_cartpole is more beautiful

* REPLAY_BUFFER

* beautiful stuff typechecks

* None support in shape

* hp tuning
2023-11-17 19:42:43 -08:00
Friedrich Carl Eichenroth 75676ab8e1
Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
mmmkkaaayy 8235da11dd
whisper: support batch inference, add librispeech WER test (#2074)
* whisper: support batch inference, add librispeech WER test, add kv caching and JIT

* remove JIT_SUPPORTED_DEVICE

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 13:50:08 -08:00
George Hotz 3baaf298d6
two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
George Hotz 70a65c201e
JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
George Hotz 01f8781c26
fix CI (#2300)
* might work

* might work 2

* might work 3

* sneak that in to llama too

* pin them all
2023-11-14 11:02:59 -08:00
George Hotz 0cbf6c1811
move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
chenyu a72b370066
llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
chenyu 5ef8d682e3
clean up attentions in stable diffusion (#2275) 2023-11-11 14:25:36 -05:00
chenyu 453f48ce02
pad None means (0,0) (#2273) 2023-11-11 09:50:26 -08:00
chenyu 880e693207
fix llama n_kv_heads in kvcache (#2267)
* fix llama n_kv_heads in kvcache

* trigger ci
2023-11-10 21:44:39 -05:00
chenyu a753c8e071
examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
wozeparrot 4c44d1344b
feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
George Hotz 2f7aab3d13
move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
Ahmed Harmouche 265304e7fd
Stable diffusion WebGPU port (#1370)
* WIP: Stable diffusion WebGPU port

* Load whole model: split safetensor to avoid Chrome allocation limit

* Gitignore .DS_Store, remove debug print

* Clip tokenizer in JS

* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS

* e2e stable diffusion flow

* Create initial random latent tensor in JS

* SD working e2e

* Log if some weights were not loaded properly

* Remove latent_tensor.npy used for debugging

* Cleanup, remove useless logs

* Improve UI

* Add progress bar

* Remove .npy files used for debugging

* Add clip tokenizer as external dependency

* Remove alphas_cumprod.js and load it from safetensors

* Refactor

* Simplify a lot

* Dedup base when limiting elementwise merge (webgpu)

* Add return type to safe_load_metadata

* Do not allow run when webgpu is not supported

* Add progress bar, refactor, fix special names

* Add option to chose from local vs huggingface weights

* lowercase tinygrad :)

* fp16 model dl, decompression client side

* Cache f16 model in browser, better progress

* Cache miss recovery

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-03 18:29:16 -07:00
George Hotz 7103b716c4
merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
George Hotz b245f1307e
add exp2 (#2192) 2023-10-31 17:48:42 -07:00
Akshay Kashyap 018bd29e37
Enable Multi-Output Export (#2179)
* Enable Multi-Output Export

* Add test

* Update examples and lint

* fix padding

* test ops

* dummy commit to rerun test

* revert cuda lint

* Enforce tuple/list of tensors

* subscripted generics

* put back webgpu test

* Re-enable WebGPU Efficientnet test
2023-10-30 18:42:26 -07:00
chenyu 8548b20b23
fix codellama params and repeat_kv (#2181) 2023-10-30 10:16:26 -07:00
George Hotz e0201922e3
Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142)
* stable diffusion < 324ms

* revert swap action

* fix tests due to more sum splitting

* REDUCEOP_SPLIT_THRESHOLD env var

* added from unaligned np test (#2134)

* align cpu buffer before copy into cl buffer (#2135)

* remove shelve from handcode_resnet50_opt.py (#2139)

* Add dictionary keys to reduce db size (#2131)

* work

* ignore beam cache

* dictionary keys are generic

* minor db cleanups

* fix baseline and extract dataset

* fix training

* log likelihood

* more lin to feats

* sts

* training policynet

* net sort of works

* dedup

* refactor, stupid new actions

* fix uops deduping

* BEAM_ESTIMATE

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>
2023-10-27 10:53:06 -10:00
will bc0829b677
Fix llama json loading (#2160) 2023-10-27 10:21:56 -10:00
nimlgen 8d41b3eb3f
beam=16 makes gpt2 gpu-time < 5ms on 3090 (#2154) 2023-10-27 10:21:27 -10:00
wozeparrot c29653605e
hip multigpu training (#1878)
* feat: move to hip

* feat: special path for RawBufferTransfer

* feat: initial rawbuffertransfer

* feat: hip ipc

* feat: working hip ipc

* feat: need to base device without args

* feat: close mem handle

* feat: modified test

* feat: more multihip stuff

* clean: cleanup

* feat: cleaner

* feat: don't crash

* feat: test more

* clean: way cleaner hip wrapper

* feat: barrier

* feat: barrier

* feat: this breaks stuff

* feat: we can use empty here

* feat: maybe fix tests

* feat: maybe fix tests again?

* fix: probably fix tests

* feat: no waiting here

* feat: wait here

* feat: much larger test

* feat: need to sync here

* feat: make this async

* feat: no waiting!

* feat: cut here

* feat: sync copy

* feat: random imports

* feat: much cleaner world

* feat: restore this

* feat: restore this

* clean: cleanup

* feat: set this
2023-10-24 17:35:53 -04:00
nimlgen e21bf776c8
fix debug=1 llama/gpt2 timings (#2143) 2023-10-24 15:45:00 -04:00
chenyu d5e2fdea22
remove shelve from handcode_resnet50_opt.py (#2139) 2023-10-24 10:37:30 -04:00
George Hotz 6dc8eb5bfd
universal disk cache (#2130)
* caching infra for tinygrad

* nons tr key

* fix linter

* no shelve in beam search

* beam search caching

* check tensor cores with beam too

* pretty print

* LATEBEAM in stable diffusion
2023-10-22 10:56:57 -07:00