1
0
Fork 0
Commit Graph

478 Commits (deepcrayon)

Author SHA1 Message Date
Jeff Moe 661dcc5ed0 Reformat, uh, everything, with black 2023-12-04 22:01:04 -07:00
chenyu 6ba6349c97
JIT=0 llama.py should not jit (#2609) 2023-12-04 20:21:07 -05:00
Yixiang Gao fde44aed76
update hip_matmul with new abstraction (#2605) 2023-12-04 13:37:10 -08:00
qazal 4380ccb169
Non fp32 math (#2264)
* `global_load` and `global_store` using buffer dtype

* `UOps.PHI` in all dtypes

* `UOps.ALU` in all dtypes

* `UOps.CONST` & `UOps.DEFINE_ACC` in all dtypes

* -- endof implementation --
+tiny lint changes

* these tests require the fp16 extention

you can run them locally to confirm they're green: (GPT2 test is broken in master for mac, see [this](https://discord.com/channels/1068976834382925865/1069001075828469790/1177993277958533261)

`GPU=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_max_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_min_float16_cpu test/models/test_real_world.py::TestRealWorld::test_llama test/models/test_real_world.py::TestRealWorld::test_gpt2 test/models/test_whisper.py test/test_specific_conv.py::TestSpecific::test_big_vec_mul`

skip the new test_linearizer_failures in CI GPU because of the fp16 extention

This passes on a real GPU since the extention is available:
`GPU=1 python3 -m pytest test/test_linearizer_failures.py::TestLinearizerFailures::test_failure_8`

see CI logs [here](https://github.com/tinygrad/tinygrad/actions/runs/6996590597/job/19032641427#step:14:644)

* these tests fail in CI due to segfaults and CPU crashes

To confirm they're green locally, you can run the following commands:

1. For the tests skipped in test_ops.py (note: CLANG is very slow)

`for var in GPU CUDA CLANG; do export $var=1; for test in test/test_ops.py::TestOps::test_slice_fancy_indexing_no_dim_collapse test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_collapse_int test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_none test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_and_collapse; do python3 -m pytest $test; done; unset $var; done`

2. For the ONNX tests skipped in CLANG:

```
CLANG=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_ai_onnx_ml_array_feature_extractor_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_0_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_1_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_none_no_weight_negative_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_log_prob_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_negative_indices_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_log_prob_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_log_prob_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_no_weight_reduction_mean_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_mean_weight_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_mean_weight_negative_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_log_prob_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_mean_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_sum_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_none_no_weight_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_sum_weight_high_ii_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_mean_expanded_cpu \
test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_expanded_cpu
```

3. The LLVM test I skipped here is already [skipped in master for all backends](https://github.com/tinygrad/tinygrad/blob/master/test/external/external_test_onnx_backend.py#L186), I just made it more specific

`LLVM=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu`

* Revert "these tests fail in CI due to segfaults and CPU crashes"

This reverts commit 15db570143.

* merge with cleanup-vectorized-hip-renders

* barely working HIP P1, ALU ops need a refactor?

* manage the fact that in HIP [half2 is actually an unsigned int vec](f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L59)) and half is a totally different __half that [has an unsigned int element in it](f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L50)) but can't be accessed [because it's private](f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L86)). If you just do this:

```
half2 val0 = // ...
half val1 = // ...
```
then you can't do:
```
val0.x + val1 // error: use of overloaded operator '+' is ambiguous (with operand types 'unsigned short' and 'half' (aka '__half'))
```

* update the sign definition to avoid division by zero in all dtypes

* diff cleanup p1: why were these in the diff anyways

* less hacky HIP, enable CIFAR fp16 benchmark, test ops for HIP in CI!

add ALU ops overloads for HIP

this will make HIP max work

handle mod

Revert "handle mod"

This reverts commit 370fd4b3fbe99b6ae8cc293d005b106628205933.

update max to use hmax

add HIP GEP render logic

enable CIFAR fp16 benchmark

test ops for HIP

back to store as float because this only works for float4 grouping right now

test_ops for hip!!

always sign

* back to the sign we had before because we cant do a backward pass on a Less node

* remove old hacks

HIP compiling test_ops in CI takes ~9 mins, not doing it for now

new HIP ALUs

* reduce accs done right

* refactor to function

* no device hacks

hacks p2

the other way

* LLVM ALU ops

half, float and double are all float

update max

* update test_uops, cmplt is always a bool in the real linearizer. assertAlmostEqual is wrong when ret is bool

* cleanup LLVM wrong code

* dummy change for the CUDA install glitch

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-03 13:45:49 -08:00
qazal 99ee2ec37a
Refactor code_for_op to accept a dtype (#2555)
* update cstyle renderers to take a dtype in code_for_op

* implement NEG for bools in LLVM

* update triton

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-01 22:05:28 -08:00
George Hotz 4c984bba7e
bump version to 0.8.0, clean CI, remove requests (#2545)
* bump version to 0.8.0, clean CI, remove requests

* why was that even there
2023-12-01 10:42:50 -08:00
nimlgen badc97f824
hip & cuda to gpuctypes (#2539)
* cuda with gpuctypes

* hip gpuctypes

* graphs

* rename + linter happy

* use cpu_time_execution

* no ji in build_kernel_node_params

* remove hip_wrapper

* hip fix

* no arc

* smalle changes

* no clean moduke in cudacpu
2023-12-01 09:25:27 -08:00
chenyu 7fec966b5e
bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
Matthias Kronberg 5394a05b9d
Fix: Get item from ndarray before casting to int (#2525)
Directly casting is deprecated and will error in the future.
2023-11-30 18:34:31 -08:00
George Hotz 2c363b5f0b
new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
Davi Silva ddeec24fa8
Cleanup & fix llama.py (#2524)
* docs, cleanup crap

* comma AI

* fix 70B

* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
George Hotz 6707f2588e
use copyin (#2500)
* it's always copyin

* all RawBuffer are RawBufferCopyIn

* cleanups

* this fixes it

* requirements='C'

* more correct
2023-11-29 09:34:00 -08:00
George Hotz 5629fc368c
Use Buffer.STORE at the end of ASTs (#2494)
* work

* store broken

* interpreteds work

* this passes

* symbolic cpu

* fix tests

* fix opt tests

* images fail

* fix InterpretedFlopCounter

* stupid hack for images
2023-11-28 20:11:37 -08:00
Jake 5588922884
Update cuda_matmul.py (#2495) 2023-11-28 19:46:01 -08:00
George Hotz d87a246439
move to new cached fetch (#2493)
* move to new cached fetch

* extra.utils is over

* loads

* bump download cache

* bump timeout
2023-11-28 17:36:55 -08:00
George Hotz ab5d14d4ba
MEM -> LOAD (#2492)
* MEM -> LOAD

* keep legacy working
2023-11-28 16:46:37 -08:00
George Hotz 3f137b134a jax parallel matmul example 2023-11-28 13:48:11 -08:00
Davi Silva 186ac77ec3
Update hip_matmul.py (#2480) 2023-11-27 18:36:19 -08:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz 7170a9a057
coder.py can write and run code (#2439)
* wip mistral

* coder

* touchups

* cleanups

* mistral cleanups

* clean up cache create

* download the weights, fix tests

* fix llama loading

* global fixup

* clean up all

* move llama model

* cleanups

* Revert "cleanups"

This reverts commit a71c5d59eb.

* fine, leave it
2023-11-25 12:27:54 -08:00
George Hotz 8ff2e13550
From teeny (#2426)
* changes from teenygrad work

* support not supporting ImageDType/PtrDType

* fixups from teeny
2023-11-24 12:50:56 -08:00
nimlgen e68aebfff9
bring hip graph back (#2385)
* bring hip graph back

* share with metal

* fix linter

* remove hasattrs

* Update ops_hip.py

* hip wrapper does not use _buf

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-24 07:53:44 -08:00
George Hotz 12023b6824
onnx ops cleanup (#2413)
* onnx ops cleanup

* revert those
2023-11-23 18:39:49 -08:00
George Hotz 095e2ced61
add name support to fetch (#2407)
* add name support

* use fetch in gpt2

* remove requests from main lib, networkx also optional

* umm, keep that assert

* updates to fetch

* i love the walrus so much

* stop bundling mnist with tinygrad

* err, https

* download cache names

* add DOWNLOAD_CACHE_VERSION

* need env.

* ugh, wrong path

* replace get_child
2023-11-23 14:16:17 -08:00
George Hotz 0505c5ea50
remove force_wait, refactor to graph (#2405)
* remove force_wait

* refactor

* get rid of stupid ASTRunner

* fix del in diskbuffer

* BufferOps.FROM_UNDERLYING

* put offset in the rawbuffer

* fix bugs

* use exec
2023-11-23 12:46:07 -08:00
George Hotz 4f8f0ac139
minor cleanups, remove dead files (#2398)
* minor cleanups, remove dead files

* s.name

* use disk

* pytest passes on mac
2023-11-23 09:01:50 -08:00
George Hotz 66c75f30c6
remove triton (#2396) 2023-11-23 07:40:59 -08:00
chenyu 8798d120bb
autopad shapetracker for BEAM (#2375)
* autopad shapetracker for BEAM

* OptOps.PADTO

* skip that test for now

* correct padding reduce axis

* just 32

* avoid more than double the FLOPs

* cleanups

* test case

* no support for triton and llvm yet

* typos

* symbolic shape would not work

* cannot PADTO with MAX kernel

* advance db version

* no breaking change - don't advance db version

* is triton just python?

* Revert "is triton just python?"

This reverts commit 17e776c25587615e33a3634c2fb0bb8591ce65d4.

* Revert "Revert "is triton just python?""

This reverts commit 6c434c01e1c4b0ea0431ec18632cd859fb3cf260.

* support llvm

* is it really passing in CI only?

* update tests

* oh triton test passed

* simpler

* revert that, with a test

* check if st are the same

* Revert "check if st are the same"

This reverts commit d2a5eac110a5da1af82a2728c883779ef69c3cad.

* update the db version

* rebase artifact
2023-11-22 21:05:25 -05:00
qazal 0eda545946
dtypes.float.vec(sz) (#2386)
* replace all _dtypen with dtype.vec(n)

fix: print works

* conceptul refactor of cstyle render_load logic

* linearizer GEP is explicit that its dtype is the scalar version of localtype

* vectorized global_store and load don't need a conditional
2023-11-22 17:43:14 -08:00
George Hotz cbb8486779
ResNet training changes (update benchmark) (#2390)
* default arg for chunk

* bring back to_

* good changes

* new set

* unused hash

* fix optim

* new torch loader

* fix test lr scheduler
2023-11-22 17:41:12 -08:00
wozeparrot abbcc7aefa
missed cleanup from cache_id removal (#2376) 2023-11-21 01:03:43 -05:00
George Hotz a0890f4e6c
move fetch to helpers (#2363)
* switch datasets to new fetch

* add test_helpers

* fix convnext and delete old torch load
2023-11-19 12:29:51 -08:00
chenyu d7d078c7f9
Node.vars() returns a set and properly dedup (#2356)
* dedup RedNode.vars()

* vars returns a set

* fix more vars

* unused import

* update to_movement_ops

* comment
2023-11-18 17:44:52 -05:00
George Hotz 40246d35bc
ops_shm removed (#2351)
* ops_shm removed

* buf.cast

* err, forgot those
2023-11-18 11:41:58 -08:00
George Hotz c7b38b324b
A beautiful MNIST training example (#2272)
* beautiful mnist

* beautiful mnist example

* from tinygrad import Tensor

* more beautiful

* the jit is super core tinygrad

* globalcounters reset on jit run

* symlinks and exclude

* beautiful_cartpole

* evaluate is it's own function

* no symlinks

* more beautiful

* jit reset for double speed

* type hinting for JIT

* beautiful_mnist gets 98%

* beautiful_mnist < 4s with BEAM=2

* better cartpole

* use actor critic

* zero_grad got lost

* delete double relu

* stable cartpole with PPO

* beautiful_cartpole is more beautiful

* REPLAY_BUFFER

* beautiful stuff typechecks

* None support in shape

* hp tuning
2023-11-17 19:42:43 -08:00
chenyu d2c0035c73
add back as_strided, move rebuilt mops to extra (#2344)
* add back as_strided, move rebuilt mops to extra

* negative stride for ops_cpu

* Revert "negative stride for ops_cpu"

This reverts commit a13b6815ac.

* skip that

* style
2023-11-17 14:34:30 -05:00
George Hotz 652d2de256
wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
chenyu 822d6e6f18
Simpler mops verify (#2325)
* rewrite the to_movement_ops check using symbolic

* tweak
2023-11-15 21:47:18 -05:00
forcefieldsovereign b64738e1d6
Remove AS_STRIDED from shapetracker (#2216)
* very close

* remove comment

* negative strides working

* almost everything passes

* calculate offset with list comprehension

* some cleanup

* got disk load working

* review suggestions

* fix after merge

* overlap working

* did it

* clean

* fixed disk load

* lint

* mypy

* removed as_strided

* trying without simplify

* added back simplify

* make sure expanding to smaller shape

* cleanup

* removed comment

* removed env file

* trying whisper test again

* onnx test sqlite issue

* working on test

* finished test

* eliminate unnecessary shrink-then-pad

* don't shrink buffer

* added strides check

* added to ci under linters

* switch issue

* allow symbolic stride

* removed .env

* isinstance

* adjust strides for double expand

* cleanup

* needed to add type hint for mypy

* set pythonpath
2023-11-15 15:50:17 -05:00
geohotstan 3c5a51fb3a
aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
George Hotz 4f7b1ac0d2
cleanups before interpreted jit (#2306)
* jit mnist

* InterpretedFlopCounter doesn't rely on Interpreted

* allocator for cpu and torch

* types for exec_ast

* fix type issues

* fix onnx, remove print

* always self.from_underlying
2023-11-14 21:44:25 -08:00
nimlgen 4e0d47533e
beam works with var vals (#2296)
* beam works with var vals

* test passes now

* better comment

* linter happy
2023-11-14 13:03:19 -05:00
George Hotz 0cbf6c1811
move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz b1f7f29525
metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
rodfer 53c5baa8b6
add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
valar 123ea051e6
refactor/ci: delete many `# type: ignore` (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
geohotstan b853e9bb8c
Onnx 1.15.0 gogogo (#2217)
* lol

* lol

* add GELULULULUL

* onnx 1.50

* fuk torch bool neg

* exclude regex tests

* exclude dequantizelinear for now

* is sunny in philly

* damn it affinegrid

* fixed auto_pad VALID

* skip 0 shape tests

* add temporary cast in Reduces

* tests should pass now

* added comments and cleanup

* try moving dequantizelinear to onnx.py

* fixed dequantizedlinear?

* cleanup

* try?

* float16 segfaults LLVM CI..???

* cleanup comments

* pin to 1.50.0

* remove use of -np.inf cuz numpy is kill

* 1.50? lol I'm actually retarded

* thx for review, muhbad

* moved Gelu higher up
2023-11-10 15:36:48 -08:00
chenyu a753c8e071
examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz 80bf0b8586
proper wmma (#2245)
* proper wmma

* hip cast

* bugfixes

* bugfix

* that bug is fixed

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2023-11-09 15:15:18 -08:00
wozeparrot 4c44d1344b
feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00