1
0
Fork 0
Commit Graph

478 Commits (deepcrayon)

Author SHA1 Message Date
Friedrich Carl Eichenroth 740304ef9d
Small Onnx Parser Improvements (#885)
* wip

* rename onnx_version to onnx_model_versioN

* add type

* add types

* small cleanup

* revert some changes from before

* add todo

* dumb fix
2023-06-01 00:01:01 -07:00
Marcello Fuschi 3924aae8ed
Fix ONNX dropout and unify the implementation (#857)
* Fix ONNX dropout and unify the implementation

* Use tensor rand method for dropout

* Change approach for RNG in ONNX Dropout

* Fix style

* Test legacy RNG seeding

* Remove the necessity for legacy RNG in Tensor class
2023-05-31 07:40:47 -07:00
skobsman 2e393f7ef2
InstanceNormalization ONNX test fixed. (#870) 2023-05-30 16:07:44 -07:00
Friedrich Carl Eichenroth f91f28d9e2
fix a bunch of tests (#856) 2023-05-29 17:48:26 -07:00
zk-tarts 174c65b7d9
add onnx Binarizer op (#850)
Co-authored-by: zk-tarts <>
2023-05-29 13:15:50 -07:00
M4tthewDE 4408c25e9a
Add Onnx op Shrink (#851)
* Add onnx Shrink operation

* Fix soft/hard shrink onnx test
2023-05-29 13:15:39 -07:00
Friedrich Carl Eichenroth 6f2b3755ca
set axis default to 0 (#854) 2023-05-29 13:15:28 -07:00
Friedrich Carl Eichenroth 3b158f7a5f
fix onnx versions greater or equal 10 (#853) 2023-05-29 13:04:06 -07:00
Diogo 1a5d72f812
Onnx ops And, Or, Xor, Not (#847)
* onnx and, or, xor, not

* added bool type to llvm and clang

* removed float conversion

* switched where op to use tensor func
2023-05-29 11:09:20 -07:00
SnakeOnex 844e6d0753
conv1d & conv3d onnx tests (#835)
* conv1d onnx

* [Work in progress] conv1d + enforcing full padding tuple length

* make ONNX padding reorder not hardcoded, works for 1D and 3D convs now

* conv2d interprets padding based on the input tensor dimensions
2023-05-29 10:16:45 -07:00
Marcello Fuschi 6d49925a26
Add max_pool2d dilation (#833) 2023-05-28 15:16:48 -07:00
cheeetoo 21d27d31a9
Fix a couple pad tests (#827)
* fix pad bug

* float type hint for value

* convert pads to list

* update Pad type signature

* Change | to Union since not supported in < python 3.10
2023-05-28 12:06:46 -07:00
Mattis Megevand 606b841d3f
LR Schedulers (#755)
* lr schedulers + test

* lr scheduler test moved + integration test

* integration test for all lr scheduler

* lr scheduler test now deterministic

* changed optimizer + parameters for lr sched test
2023-05-27 07:47:49 -07:00
George Hotz 87fa5af70a ptx example 2023-05-26 19:28:51 -07:00
George Hotz 26014a0fa1
add convtranspose (#809)
* add convtranspose

* onnx convtranspose
2023-05-26 12:35:03 -07:00
wozeparrot 7351eb4b61
feat: put temperary file in the same directory as the destination file (#805) 2023-05-25 20:46:02 -07:00
Diogo c19ef0fcce
Add sin/cos/tan (#794)
* added sin/cos/tan

* fix lint

* added onnx ops support
2023-05-25 09:04:56 -07:00
George Hotz 0400315078 Revert "ops rdna"
This reverts commit 81a11d891d.
2023-05-21 13:02:18 -07:00
George Hotz 325a3bf2cf Revert "writing 2"
This reverts commit dddd6c42f0.
2023-05-21 13:02:17 -07:00
George Hotz dddd6c42f0 writing 2 2023-05-21 12:52:36 -07:00
George Hotz 81a11d891d ops rdna 2023-05-21 11:45:38 -07:00
George Hotz 90fff82c8a
Rdna (#776)
* assembler maybe

* custom asm

* rdna3 on quiet

* trigger crashes

* fixed notes

* non-fatal rdna2 crash

* Crash4

* improve rdna sniffer

* comments

* improve sniffer

* asm

* 131 TFLOPS RDNA3

* opt simple matmul

* todos
2023-05-16 05:33:57 -07:00
George Hotz 89b8b39d9c fix mypy 2023-05-13 21:25:36 -07:00
George Hotz e0b2035023 fast imagenet eval, gets 76.14% across the set 2023-05-13 21:18:31 -07:00
George Hotz 46d419060b start on mlperf models 2023-05-10 16:30:49 -07:00
George Hotz cb7c22beeb fix mypy 2023-05-06 19:18:54 +00:00
George Hotz 5190037cbc rocm: disassembler for shader 2023-05-06 19:07:52 +00:00
George Hotz 42256c0d9d rocm sniffer dumps code 2023-05-05 18:36:53 +00:00
George Hotz f2a964f447
nocopy (#764) 2023-05-05 09:32:06 -07:00
George Hotz 3a2011ab2d rocm sniffer 2023-05-04 22:22:39 +00:00
George Hotz a55c4f5000 better rocm build scripts 2023-05-04 09:14:05 +00:00
George Hotz 987b1aaf96 rocm build scripts 2023-05-04 08:45:23 +00:00
George Hotz ed33a89d52 no werror in archprobe 2023-05-03 19:34:17 +00:00
George Hotz 7ecf4dff68
multi cl_queue (#762)
* multi cl_queue

* only platforms 1

* gpus first, then cpus

* put device on underlying buffer

* cl_queue array
2023-05-03 12:15:28 -07:00
George Hotz 3b933b0a2f rocm setup script 2023-05-03 16:01:17 +00:00
George Hotz 59d0d168cd FLOAT16 off works 2023-04-19 15:34:56 -07:00
George Hotz 3d15769a8f 50 TFLOPS cuda matmul 2023-04-19 14:38:24 -07:00
George Hotz 0b5a0b9ba4 winograd comment 2023-04-16 03:36:51 -07:00
George Hotz 8b777af571 metal_conv gets over 10.4 TFLOPS... 2023-04-15 03:31:22 -07:00
George Hotz d66e682205 metal matmul from tcores branch 2023-04-14 23:29:29 -07:00
Sohaib 70b9072663
add Pad onnx operator and rework _padding (#740) 2023-04-06 17:07:36 +05:30
George Hotz 94e2c49c35 test_cacheline_size that works in both places 2023-03-30 06:47:20 +04:00
George Hotz b05c2828f7 better cacheline test 2023-03-30 06:08:54 +04:00
George Hotz 76db1af6fc better archprobe 2023-03-30 05:52:00 +04:00
George Hotz 20894991ed
good changes from the M1 Tensor Core project (#730)
* good changes

* working except llvm

* llvm types

* nice acc

* archprobe

* lang.float4

* use self.acc for late acc

* fix store bug
2023-03-29 05:11:02 +04:00
George Hotz 68e45fca18 metal_matmul: bw and torch sync 2023-03-23 08:02:04 -07:00
George Hotz bd6c3c31a9 compare to torch 2023-03-22 23:58:37 -07:00
George Hotz c3a3db75c7 fix metal matmul example 2023-03-22 23:42:51 -07:00
George Hotz b12b60af20
fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
Fernando Vidal 73bd0b217b
add int64 as supported dtype from numpy (#699)
* add int64 as supported dtype from numpy

Without this, examples/transformer.py didn't run. With this change it runs successfully.

* Update helpers.py

* Update transformer.py

* Update training.py
2023-03-18 17:15:04 -07:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Kirill 0532025b04
Fix llama 13B weights loading (#700)
* Fix llama 13B weights loading

* refactor more

* add test

* test storage offset

* fix spacing

* fix strides

* llama 13B working?

* yolo?

* better test for seeks
2023-03-15 08:59:52 -07:00
George Hotz 15e0b56e39
compile works (#688)
* compile works

* runtimes

* line count

* fix custom, to tg dtype

* meh, that's fine with lazy import
2023-03-12 11:01:25 -07:00
Kirill af7745073f
Add comments to SD (#686)
* Add explanation for empty lambdas

* Fix my_unpickle if pytorch_lightning is installed

* oops
2023-03-12 10:56:49 -07:00
George Hotz 6c3675c01c _mmap loads to gpu fast 2023-03-11 23:00:13 -08:00
George Hotz 803b0aef28 track memory for numpy/torch 2023-03-11 20:39:10 -08:00
Diogo 784afc6c6f
Eq magic function support (#683)
* add eq magic func

* changed from eq to __eq__

* ignore type for linter

* mypy doenst like descriptions :(
2023-03-11 10:31:46 -08:00
George Hotz 01f39b19dc move to shapetracker.py 2023-03-11 07:50:07 -08:00
George Hotz f3ac52aee8
Mypyc (#680)
* building shapetracker

* default ENABLE_METHOD_CACHE

* symbolic compiles

* improve types

* tensor compiles

* oops, that's a bug

* best of both worlds

* find legit typing bugs

* pad2d can take list or tuple

* sub 200ms when compiled
2023-03-11 07:33:30 -08:00
George Hotz d7cb8e3e56 multithreaded fake_torch_load_zipped 2023-03-10 19:16:27 -08:00
George Hotz b1206bcb18
third try at torch loading (#677)
* third try at torch loading

* numpy fixed

* fix enet compile

* load_single_weight supports empty weights

* oops, CPU wasn't the default

* so many bugs
2023-03-10 19:11:29 -08:00
George Hotz 4780f9a6df llama runs (slowly) in master 2023-03-10 17:36:51 -08:00
George Hotz 1826ff6b89
dtypes nice and clean (#673)
* add dtype class

* dtypes

* buffers are lazy

* dtype is tracked by lazybuffer and GenericShape

* fix types in llvm

* llvm store

* dtype tests

* fix tests maybe

* fix flop counter

* fix CI

* CI fix and check format

* fix dtype and dtype check

* fix custom test

* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz d26345595d more llama stuff 2023-03-10 10:48:10 -08:00
George Hotz 1a039306d2
good changes from llama branch (#671)
* good changes from llama

* transpose behavior changed
2023-03-09 20:51:22 -08:00
George Hotz d8dda2af3a openpilot fixups 2023-03-06 14:14:44 -08:00
George Hotz a77d792aff
Codegen gpu cleanups (#640)
* cleanups

* fixups

* handle pre upcasted global buffers

* early is just required

* delete junk from hand coded opt

* implicit upcast_in_mid_reduce

* speedup

* fix exec w validhacks

* reorder opt

* only need to check the output for that

* return total runtime from kernels if debugging
2023-03-04 15:31:51 -08:00
Patrick Geneva 117111825c
Fix windows file permission error (#634) 2023-03-04 09:23:55 -08:00
George Hotz 528cb3b3b9 fix ast test 2023-03-04 07:49:25 -08:00
George Hotz 893f136fe0 lines from helpers 2023-03-03 23:07:46 -08:00
George Hotz c53efb3635
optimize for CL (#633)
* required opt

* simplify

* works

* shift_to_last

* required is fine

* print shape in colored

* better shape

* args was wrong

* debugs

* fix empty shape

* colored shape printer
2023-03-03 22:00:09 -08:00
Diogo 52204a7b88
adding comparison operators (#616)
* Less, LessOrEqual, Greater, GreaterOrEqual, Equal

* lint fix

* using built in functions

* overriding __eq__ breaks things

* backwards pass for less - foward only tests

* one other spot

* removing backwards for comparison ops to match pytorch

* raise runtime error

* more tests for comparison ops

* fixed the lineup

* added number upcast tests
2023-03-02 08:10:44 -08:00
George Hotz d062cc82b8 put restrict back 2023-03-01 21:34:45 -08:00
George Hotz bfcec234a2
Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz 7e6edfbc64 unbreak onnx conv padding 2023-02-28 13:55:03 -08:00
George Hotz 7d556ca7e0 avg/max pool work in N-D 2023-02-28 13:38:27 -08:00
George Hotz d584bae5c0 fine, openpilot can have 197 kernels 2023-02-27 11:48:36 -08:00
George Hotz 7b999add1d all onnx model tests pass 2023-02-27 11:22:45 -08:00
George Hotz 652d48ccec onnx : openpilot expand issue was fixed yesterday. remove hack 2023-02-27 11:04:42 -08:00
George Hotz 9d6b63f043 add ConstantOfShape 2023-02-27 10:57:50 -08:00
George Hotz 082134952b CastLike works with one type hack 2023-02-27 10:51:26 -08:00
Jacky Lee 1ffe8d68d5
Add more onnx ops (#615)
* Add Celu

* Add thresholded relu

* Add softsign
2023-02-27 10:43:41 -08:00
George Hotz 643e8b0388 fix tests, test bn evaluate too 2023-02-27 10:39:47 -08:00
Diogo 07e643431c
added onnx group norm (#614) 2023-02-27 08:11:01 -08:00
Diogo e68fa18c9b
layer norm support in onnx (#607)
* layer norm support

* switched to 1e-05
2023-02-26 22:04:02 -08:00
George Hotz 3a2a500e90 prevent race condition, external yolo test for now 2023-02-26 17:08:24 -08:00
Sohaib 71ae6e5605
fix: avgpool without counting padding (#605) 2023-02-26 07:13:00 -08:00
George Hotz a8de233e12
only div, no reciprocal (#601)
* only div, no reciprocal

* remove reciprocal

* fix pad shuffling
2023-02-25 09:35:03 -08:00
Sohaib d581a99d90
onnx: lrn (#602)
Co-authored-by: Sohaib Errabii <errabii.sohaib@gmail.com>
2023-02-25 09:24:53 -08:00
voidz 94bec40110
moved extras/jit.py -> tinygrad/jit.py (#599)
* moved extras/jit.py to tinygrad/jit.py

* fixed indent

* removed tinygrad.helpers.DEBUG from jit.py
2023-02-25 08:32:33 -08:00
George Hotz 2c5e13a513
Reluless (#600)
* replace relu for maximum

* fix for other backend

* clean up RELU and GT0

* tests for maximum

* had to clean that up

* why reverse a maximum?
2023-02-25 01:21:16 -08:00
George Hotz 176ad29974 retain support for old onnx 2023-02-24 22:29:54 -08:00
George Hotz da5643d024 rest of tests shouid be made to pass 2023-02-24 12:52:23 -08:00
George Hotz 85452fbaf3 onnx 58/109/208 2023-02-24 12:19:05 -08:00
George Hotz e8a153e4e9 onnx : add a whole bunch of ops 2023-02-24 12:00:03 -08:00
George Hotz f2486a7248 more onnx ops 2023-02-24 10:55:58 -08:00
George Hotz 4d0a3dd653 openpilot expand is bugged 2023-02-24 10:25:59 -08:00
George Hotz 2e56a4793e rename log_softmax, support dim, fix onnx Softmax 2023-02-24 10:11:24 -08:00
George Hotz 5cdfeffe2c fix shape test 2023-02-24 09:36:32 -08:00
George Hotz 3becefa218 fix onnx tests 2023-02-24 09:27:18 -08:00
George Hotz e263c0c628 onnx : another model test is passing 2023-02-24 09:22:58 -08:00
George Hotz d3feea302d much cleaner way to write onnx ops 2023-02-24 08:46:28 -08:00
George Hotz f6d946853c more bugfixes 2023-02-24 00:21:29 -08:00
George Hotz b1b2d8f440 onnx : some op tests working 2023-02-23 23:58:13 -08:00
George Hotz b287b1d529 fix yolov8 to get to ConvTranspose 2023-02-23 22:46:48 -08:00
George Hotz 2d59b25ead onnx backend test : enable only the model tests 2023-02-23 22:36:26 -08:00
George Hotz d8b6f241f1 external_test_onnx_backend 2023-02-23 21:55:07 -08:00
Sohaib 8835df7a5c
upgrade onnx to 1.13.0 (#588)
- remove protobuf from direct dependencies
- replace deprecated mapping.TENSOR_TYPE_TO_NP_TYPE

Co-authored-by: Sohaib Errabii <sohaib.errabii@ipops.io>
2023-02-23 13:59:23 -08:00
calledit 81f7c6800a
Added info on simdgroup availability (#586)
* Add info on simdgroup availability

* "osx" not "os x"

* Update metal_matmul.py

* Update metal_matmul.py
2023-02-23 13:59:02 -08:00
George Hotz d22e19536b onnx: support low quality Resize. stuck on ConvTranspose will have to wait for convless 2023-02-23 09:05:23 -08:00
George Hotz ab3a2ae9a2 fix test_resnet in onnx now that maxpool works 2023-02-23 08:41:47 -08:00
George Hotz fd6082dcef support all _pool2d. conv will eventually be an hlop 2023-02-23 08:19:47 -08:00
George Hotz 76b4d0577d yolov8 works up to the MaxPool 2023-02-22 19:32:13 -08:00
George Hotz c4c2c28738
a sustainable approach to float4 (#582)
* a sustainable approach to float4

* can_float4

* fix tests

* fix float4

* delete dead code

* types and minor cleanup
2023-02-22 09:45:08 -08:00
George Hotz c5e2126d49 move DEBUG to helpers 2023-02-22 06:52:11 -08:00
George Hotz 4d232c7c95 optional networkx + DEBUGCL=2 2023-02-20 09:50:46 -08:00
George Hotz bbfec2fde7 8.46 TFLOPS 2023-02-19 13:21:25 -08:00
George Hotz 1ba847963d reshape and retain metal_matmul 2023-02-19 13:07:23 -08:00
Kirill 7944cfdadc
Remove Tensor.data (#565) 2023-02-18 16:36:12 -08:00
Jacky Lee 9fd41632c6
Import get_parameters from tinygrad.nn (#559)
* get_parameter is in optim

* Update all imports for get_parameters

* Clean up

* use optim.get_paramters
2023-02-17 15:22:26 -08:00
George Hotz 82c257e8f5 more kernel search 2023-02-12 10:34:56 -08:00
George Hotz de71c13934 test speed v torch uses jit 2023-02-12 07:43:17 -08:00
George Hotz ba3bf5bdf7 cifar stops learning 2023-02-11 17:21:42 -08:00
George Hotz 40f3949742 fancier KOPT 2023-02-11 16:40:25 -08:00
George Hotz 446442dbb3 fix tests symbolic 2023-02-11 15:16:47 -08:00
George Hotz 20a351a3c6 hand optim CONVW 2023-02-11 14:41:08 -08:00
George Hotz 031edd01e6 switch openpilot compile to TinyJit 2023-02-11 09:51:44 -08:00
George Hotz 608fd730d3 put the JIT in extra 2023-02-11 00:35:18 -06:00
George Hotz fed95119dc CL.mem_used -> GlobalCounters.mem_used 2023-02-10 23:13:29 -06:00
Kirill 27154db99a
Downloads weights in examples/stable_diffusion.py (#537)
* Downloads weights in examples/stable_diffusion.py

* use download_file_if_not_exists in fetch

* make consistent with previous NOCACHE behavior
2023-02-10 14:37:04 -06:00
George Hotz 5ed3622965 add dump to kernel_search 2023-02-10 12:13:30 -06:00
George Hotz d9555bc478 that turned out to be dumb 2023-02-08 16:52:29 -06:00
George Hotz 3d63934995
refactor to keep cl in the runtime (#545)
* refactor to keep cl in the runtime

* fix thneed, rename cl to _cl

* bugfix + _cuda

* fix tests

* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz 2844482a60
Mypy fun (#541)
* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz 185d2e3678 fix map_buffer and add some __slots__ 2023-02-07 15:32:48 -06:00
George Hotz d93563f39f fix KOPT 2023-02-07 06:56:33 -06:00
George Hotz f7291f6ca3
fixes big KOPT, breaks opencl (#505)
* fixes big KOPT, breaks opencl

* fix optimizer

* KernelCache

* oops, broke batchnorm

* hack to fix it

* fix llvm, less hacky gpu

* disable the cache

* cache just breaks things
2023-02-05 10:46:17 -08:00
George Hotz cd97b036cc
A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
Jacky Lee 799b3f185a
Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz 60ccddb58b reenable SWAP 2023-01-30 17:32:02 -08:00
George Hotz aea55eb196 found failing upcast 2023-01-30 16:12:56 -08:00
George Hotz b67f997864 tests pass w/o float4 2023-01-30 15:40:49 -08:00
George Hotz c6f570a2e6 improve progress bar 2023-01-30 14:50:28 -08:00
George Hotz 7118602c97 goat progress bar 2023-01-30 14:37:26 -08:00
George Hotz cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
AllentDan 7b6b1f32b1
[Fix] fix typo: test_mnist -> datasets (#492)
* test_mnist -> datasets

* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz 2db272c7f7
Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
George Hotz bb0cdc2442 111.51x speedup for reduce 2023-01-29 03:06:00 -08:00
George Hotz 45c0aa6e2d search with SHIFT, REDUCE 2023-01-29 02:42:20 -08:00
George Hotz 87879cf4b6 improve search more 2023-01-29 02:08:57 -08:00
George Hotz f6bbd43cb8 improve search 2023-01-29 01:33:47 -08:00
George Hotz ebdec2b72f fix optimizer 2023-01-29 00:23:06 -08:00
George Hotz a9cabce791 oops, broke mem estimates 2023-01-28 20:21:31 -08:00
George Hotz a500e79bd1 don't OPTWG on OS X, it's way slower 2023-01-28 20:02:33 -08:00
George Hotz b0df4d99a0 os x profiling: this ratio is exact i believe 2023-01-28 19:02:51 -08:00
George Hotz ae810eb558 minor cleanups 2023-01-28 08:59:15 -08:00
George Hotz 6d5e1a8029 GEMM kernel search 2023-01-27 10:08:57 -08:00
Comma Device f08e740957 factor out hand coded opt 2023-01-26 14:54:06 -06:00
George Hotz 5e8a36a18b real op kernel 2023-01-26 09:51:32 -08:00
George Hotz e0600f537a op kernel in kernel search 2023-01-26 09:47:01 -08:00
George Hotz aafc29484a cleanups 2023-01-25 12:37:10 -08:00
George Hotz 919e943867 decent search 2023-01-25 12:20:53 -08:00
George Hotz 7f3da91f8b kernel_search 2023-01-25 12:05:09 -08:00
George Hotz e37424424f first little attempt at search 2023-01-25 11:49:29 -08:00
Comma Device 9e2af0a972 too far with the OPTWG 2023-01-24 13:14:59 -06:00
Comma Device 3590848b93 a little more local workgroup options 2023-01-24 12:50:27 -06:00
Comma Device 4b74752c42 fix hotspots by improving the workgroup optimizer 2023-01-24 12:46:28 -06:00
George Hotz fd760a390a fix incremental time 2023-01-24 10:19:04 -08:00
George Hotz a949de873b
reduce 2.0 (#469)
* reduce 2.0

* works

* hacks

* DEBUG=3 for shapes

* fix types

* 0s weren't being folded

* cleaner

* last_reduce is no longer needed

* comments and cleanup
2023-01-23 15:11:13 -08:00
George Hotz f1196984e6 harmless to intertwine the math and the stores 2023-01-21 09:31:56 -08:00
George Hotz 708215d06b
Typing (#468)
* we typing

* types look good in theory

* most tests pass

* gpu tests pass

* TEST_AST

* delete comments

* i must have written that bug so many times

* bugfix

* don't merge the small ones

* add f to constants

* commits from reduce

* don't GCD the mod nodes

* broken and a hack IMAGE=3

* group for reduce

* fix linter + mypy

* move out test ast

* insource TENSOR_TYPE_TO_NP_TYPE

* does this fix it?

* move imports out
2023-01-21 09:09:22 -08:00
George Hotz 0881d504c1
move shapetracker (#466)
* move shapetracker

* shapetracker test

* move ast

* move a few things

* fix print kernel

* fix test

* symbolic fixups
2023-01-19 09:56:31 -08:00
George Hotz 9245f4650a indexer changes for master 2023-01-18 18:02:02 -08:00
George Hotz 49c6e6d472
Latest attempt to add image (#462)
* add image

* load + store + boring stuff:

* image tests pass

* thneed print GFLOPS

* op conv test

* more debugging

* hack for multiview image

* shapetracker creates less views

* disable image tests

* working better

* ugh, lkey not key

* print in DEBUG, and allow views

* works

* simple padding conv2d

* use index for image

* that was bad code

* debug print

* fix types

* less lines

* save lines
2023-01-12 17:36:30 -08:00
George Hotz 281b0db773 three from image 2023-01-12 12:26:58 -08:00
George Hotz 9ff6c532eb
Prereqs for IMAGE=1 (#461)
* contig

* move ast, debug prog

* add Token

* cleanup reduce

* exec_ast
2023-01-11 20:18:42 -08:00
George Hotz fff1f046b0
Simple version of the new GPU backend (#458)
* newgpu

* more to delete

* hmm, tests pass with constant folding

* fix lint/type

* fix constant folding

* comment and rerun tests

* lazy touchups

* fix graph_batchnorm test

* smaller transformer to fix OOM

* Revert "smaller transformer to fix OOM"

This reverts commit a44ef8edc2.

* no func cache

* introspect

* touchups

* CLASTKernel

* ugh, it was lru_cache

* codegen

* spacing

* old gpu still in opencl

* typing fix
2023-01-10 19:16:02 -08:00
George Hotz fad7cba590 move batchnorm to Tensor 2023-01-09 18:00:16 -08:00
George Hotz 4885fce56e
shapetracker from newgpu (#456)
* shapetracker from newgpu

* touchup ops

* test

* testst

* thneed deletes unused inputs

* test

* bugfix
2023-01-09 12:40:01 -08:00
George Hotz b8c94a67c9
Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
George Hotz 2cc1d970c6 updates from the chonker branch 2022-11-07 21:12:08 -08:00
George Hotz d878065ece
Gemm (#416)
* gemm

* off by factor of 5

* 50 GFLOPS

* works

* 91 gflops

* working at 50G

* works

* iy

* 150 GFLOPS

* 150 GFLOPS

* N=2048 is still fast

* threading soon

* multithread

* pinning

* throttling is sad

* Align matrices to cacheline width (#361)

Co-authored-by: cloud <Cloud11665@gmail.com>
2022-11-06 10:07:28 -08:00
George Hotz 6a8fb53304
move ops.py into lazy.py (#402)
* move ops.py into lazy.py

* fix graph and linter

* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz 8e22d5ee67 replace networkx with defaultdict 2022-10-20 19:36:43 -07:00
George Hotz 63f9c55156 really dumb bug 2022-10-20 17:07:47 -07:00
George Hotz 1bec4651b3 fix nonstatic weights 2022-10-20 17:04:14 -07:00
George Hotz bb288e6938 safe_numpy and warning for broken matmul 2022-10-20 15:40:22 -07:00
George Hotz 50c95c7d9a add assert to catch issue in attention 2022-10-20 15:13:00 -07:00
George Hotz 26c78ccf7d remove useless buffer 2022-10-20 14:07:28 -07:00
George Hotz a18c1f3178 zero out the inputs 2022-10-20 13:46:52 -07:00
George Hotz ace8db29f8 ReduceSum 2022-10-20 12:48:14 -07:00
George Hotz c400ee0beb
refactoring thneed (#400)
* refactoring thneed

* continue

* minor update

* looks like it's working

* big refactor

* confirm thneed got the right output

* code is there but it's broken

* works now

* always OPTWG, input -> dat

* fix type issue
2022-10-20 12:35:59 -07:00
YassineYousfi ae0f9b17df
openpilot: new models and onnx ops (#401)
* ngrl stuff

* fngrl

* fix typo in compile script

* workflow dispatch

* new models in tests

* dont need to up this threshold

Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
2022-10-20 11:49:19 -07:00
George Hotz ff11c4316b move get_parameters to optim.py 2022-09-25 13:16:58 -04:00
Jacky Lee 2c01a66265
Reshape dataset from fetch_mnist (#390) 2022-09-24 21:16:29 -04:00
George Hotz 271446e3eb
set requires_grad to None (#387)
* set requires_grad to None

* some things need gradients

* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
YassineYousfi 2f0f91ba3d
support float16 onnx weights (#384) 2022-09-15 09:12:18 -04:00
YassineYousfi 1a7bdc51f8
support more onnx ops (#376)
* broadcast from right to left

* add another broadcasted add test

* more onnx ops

* use float32 range in clip
2022-09-07 15:15:24 -07:00
George Hotz 0516359af8 fix stupid OPENCL=1 OOM 2022-09-06 14:29:23 -07:00
George Hotz 4dadd95e3c fix tests hopefully, more stable diffusion 2022-09-03 10:38:31 -07:00
George Hotz c01a8c5c2d stable diffusion start 2022-09-03 10:08:42 -07:00
George Hotz a3fc64a585 fix batchnorm folding in openpilot compile 2022-08-31 13:04:49 -07:00
George Hotz dc7af8c3ac thneed run float32 2022-08-28 11:03:35 -07:00
George Hotz b132de677d
tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
George Hotz f76d41812b prune graph 2022-07-17 15:38:43 -07:00
George Hotz eda6f071b2 default opt level 2 2022-07-17 14:54:40 -07:00
George Hotz 73b0471b25 join expands 2022-07-17 13:42:05 -07:00
George Hotz d04b274cd2 noop removal can replace with reshape 2022-07-16 08:32:42 -07:00
George Hotz 2720ef49ca extra and test and tuple 2022-07-07 10:01:33 -07:00
George Hotz 81b73f97a3
Optiimzation (#355)
* constant folding into kernels

* that opt worth it?

* fix mypy

* ast one kernel

* save 2 lines in conv kernel

* debug print kernel count

* cl debugging

* early realize inputs

* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz 7276f8d6bf improve constant folding, detach before moving tensor 2022-07-02 15:29:40 -07:00
George Hotz 8cf1aed0f4 don't track_running_stats, parameters must require_grad 2022-07-02 14:38:45 -07:00
George Hotz 49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz 83d50e2687 move to extra.onnx 2022-06-21 19:43:44 -07:00
George Hotz 9b27ba650b load new torch files 2022-06-07 10:06:48 -07:00
George Hotz 233c71a7ba support requires_grad 2022-06-06 07:47:31 -07:00
George Hotz d8d19ed468 wikimedia wasn't returning 200 2022-01-15 19:09:29 -08:00
George Hotz e28cdfb0cf clean up resnet 2021-11-30 16:14:54 -05:00
George Hotz 58ed46963e fix broadcastdot 2021-11-29 18:54:57 -05:00
George Hotz dca076dbf1 remove dumb nn ops 2021-11-29 18:05:31 -05:00
George Hotz 30eb3afbe1 add bias term to transformer 2021-11-29 12:45:27 -05:00
George Hotz e2a8961a18 less lines, fix bug 2021-11-17 12:52:17 -08:00
George Hotz ba28761894 move yolo into examples/yolo 2021-10-30 19:46:00 -07:00
George Hotz 63f50cff45 move back again 2021-10-30 16:13:29 -07:00
Evan Mays 285621aeda
Cherry backprop for conv2d (#281)
* quick math: 0 + x = x.

* gradient w.r.t. x using cherry for conv

* gradient w.r.t. w for conv on cherry but doing vector dot products

* small optimization

* [cherry] optimize conv backpass for large channel count

* get rid of numpy einsum
2021-10-30 16:12:19 -07:00
George Hotz 3d646272d6 move back 2021-10-30 16:12:12 -07:00
George Hotz ac8afd24fa refactor accel 2021-10-30 16:10:59 -07:00
Guglielmo Camporese 2b7589db64
Added ResNet-{18, 34, 50, 101, 152} (#271)
* added resnets

* fix minor

* fix minor

* resnet in models

* added resnet test

* added resnet train test

* added linear, conv2d nn tests

* fix minor in extra/training

* resnet in models

* fix minor

* fix tolerance for linear in nn test

* fix eval, this causes cpu and gpu UT failing

* revert transformer test

* fix minor for CPU test

* improved model get_params for sequential layer

* fix minor for params counting

* commented broken ops tests

* improved train for resnet
2021-06-21 09:37:24 -07:00
George Hotz 89798d2f43 some flags 2021-06-19 11:46:31 -07:00
George Hotz d81eae8288 debug cherry crash 2021-06-19 11:41:20 -07:00
George Hotz d3f169b267 move good models to models, add a training step test 2021-06-19 11:24:15 -07:00
George Hotz b48d4bad2e clean up print spam 2021-06-19 10:31:04 -07:00
George Hotz 027535d0b5 microcoded matmul 2021-06-17 21:03:08 -07:00
George Hotz 026e2ae6a7 three registers and a zero command 2021-06-17 17:09:18 -07:00
George Hotz 2e71ae33f6 max op works 2021-06-17 17:01:21 -07:00
George Hotz 9e12c1bbba cherry binop 2021-06-17 16:50:40 -07:00
George Hotz fcdabea880 training mnist with cherry ops 2021-06-17 16:45:35 -07:00
George Hotz 2affd226b3 speed up sum 2021-06-17 16:38:34 -07:00
George Hotz e8eb7d1b7e max op 2021-06-17 16:20:56 -07:00
George Hotz c1d469d440 sum op 2021-06-17 16:19:35 -07:00
George Hotz b1000d866e readme, plus reduce ops 2021-06-16 11:21:06 -07:00
George Hotz ff3fdc58e5 risk -> cherry 2021-06-16 09:59:48 -07:00
George Hotz 2f91c012eb build note 2021-06-15 22:41:41 -07:00
George Hotz 4850d6eb43 update todo 2021-06-15 10:22:39 -07:00
George Hotz 4e1edb3692 have tinygrad log the loads 2021-06-14 18:35:14 -07:00
George Hotz 93f2e9769d little note 2021-06-14 15:49:41 -07:00
George Hotz a89d12d735 wow, way faster 2021-06-10 17:11:39 -07:00
George Hotz 10b1306525 binops 2021-06-10 16:52:37 -07:00
George Hotz 4535d39baa comments and pow 2021-06-10 09:03:40 -07:00