* refactor/ci: delete many `# type: ignore`
* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag
refs #2240
* ci: move `--warn-unused-ignores` flag to mypy config
refs #2240
* var_vals are global
* working with global ish
* better
* fix export model
* fix tests
* better kv cache
* does it run?
* use where for kvmask
* fix excessive var_vals
* fix import
* how does multigpu use this?
* llama kinda work
* faster and simpler
* cleanup
* fix conversation mode
* test cleanups
* fix one more test
* test cleanup
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3.
* Revert "render phi as the dtype"
This reverts commit d08cb270b4.
* reenable triton tests
* no vstore_half if dtype is already half
* upcast max
* Change linearizer to parse CAST
* Oneliner renders for cstyle and triton
* LLVM cast and ALU implementation
* pylint fixes
* cast in gep
* remove printbufs
* use cast for post-load ops
* get rid of parse_cast
* partially supported vectorized dtypes for initial dev
* render phi as the dtype
* Revert "partially supported vectorized dtypes for initial dev"
This reverts commit 1bf1a818a3.
* Revert "render phi as the dtype"
This reverts commit d08cb270b4.
* reenable triton tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* For cuda get current free space from device, and rery alloc failures
* type ignore for mypy
* add init to get free mem in cuda
* Move retry logic in common lib.
Fix typo in override _get_cur_free_space
* linter error fix in test file
* Not catch all, as it will catch KeyboardInterrupt
* fix unintened line changes
* fix test ops
* decompose the err from test_ops
* skipTest skips the entire test, we dont want that
* handle cases with the same priority
* add int16 to torch map
* fuzz linearizer transformation
* no standard normal for fp16
* work
* Interpreted start
* CPU and TORCH work
* fix MemBuffer with same idx
* id for failed kernels
* no image and variable for Interpreted
* symbolic shape
* IMAGE only for GPU
* Interpreted almost all good
* cleanup
* fix bufs_from_lin
* zero size
* some failed examples
* just Exception
* just test not pass
* WIP: Stable diffusion WebGPU port
* Load whole model: split safetensor to avoid Chrome allocation limit
* Gitignore .DS_Store, remove debug print
* Clip tokenizer in JS
* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS
* e2e stable diffusion flow
* Create initial random latent tensor in JS
* SD working e2e
* Log if some weights were not loaded properly
* Remove latent_tensor.npy used for debugging
* Cleanup, remove useless logs
* Improve UI
* Add progress bar
* Remove .npy files used for debugging
* Add clip tokenizer as external dependency
* Remove alphas_cumprod.js and load it from safetensors
* Refactor
* Simplify a lot
* Dedup base when limiting elementwise merge (webgpu)
* Add return type to safe_load_metadata
* Do not allow run when webgpu is not supported
* Add progress bar, refactor, fix special names
* Add option to chose from local vs huggingface weights
* lowercase tinygrad :)
* fp16 model dl, decompression client side
* Cache f16 model in browser, better progress
* Cache miss recovery
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>