1
0
Fork 0
Commit Graph

1455 Commits (7a7046f2643dddf63c4161b274b609045f4ef64c)

Author SHA1 Message Date
George Hotz 8c8a5a77dd refactor llvm into runtime and ops 2023-02-08 16:28:32 -06:00
George Hotz 45ce4de6f3 improve typing 2023-02-08 12:48:21 -06:00
George Hotz 2e1bdc889a
write out all the functions, no auto binding (#543)
* write out all the functions, no auto binding

* cleanups, more types

* Slice is for internal calls only

* improve typing

* ugh, put slice back
2023-02-08 12:41:39 -06:00
George Hotz d854337f0d nn/optim.py compiles now 2023-02-08 11:25:18 -06:00
George Hotz 1029deccb1 refactor ops_cpu and ops_torch to not share code 2023-02-08 11:11:42 -06:00
George Hotz ee18420c13 dyn add of math ops 2023-02-08 10:04:30 -06:00
George Hotz 2844482a60
Mypy fun (#541)
* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz 996e0a10b7
update cpu and torch to hold buffers (#542)
* update cpu and torch to hold buffers

* save lines, and probably faster
2023-02-08 09:40:45 -06:00
Mitchell Goff ae4f0aeb5f
NumPy-like semantics for Tensor.__getitem__ (#506)
* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None

* Fixed pad2d

* mypy doesn't know about mlops methods

* normal python behavior for out-of-bounds slicing

* type: ignore

* inlined idxfix

* added comment for __getitem__

* Better comments, better tests, and fixed bug in np.newaxis
2023-02-08 08:59:46 -06:00
George Hotz 0ac3286af0 factor out Device 2023-02-07 16:08:20 -06:00
George Hotz 2aeebd70a6 mypy will compile the shapetracker, no speed up 2023-02-07 15:43:44 -06:00
George Hotz 185d2e3678 fix map_buffer and add some __slots__ 2023-02-07 15:32:48 -06:00
George Hotz aebe75d9a2
remove val expansion (#539)
* remove val expansion

* types for all shapetracker functions:

* more typing

* add all the parens to the test

* more types

* fix tests

* very minor speedup
2023-02-07 15:14:05 -06:00
George Hotz 001cc96e25
Lazy refactor (#538)
* refactor lazy to return ASTs

* a lil cleaner

* oops, compare ids

* gate on GRAPH

* cleanups

* less calls to log_op

* simpler

* realize_buffers -> map_buffers

* even simpler

* think in asts

* a lil cleaner

* NOOP means contiguous
2023-02-07 11:53:21 -06:00
George Hotz 02d8cb0959 lazy cleanup 2023-02-07 07:39:53 -06:00
George Hotz d93563f39f fix KOPT 2023-02-07 06:56:33 -06:00
Jared Z 7604b17fbf
TestZeroViewShapeTracker fix test (#481)
* TestZeroViewST test

* updated to align with st naming conventions in file

* Update test_shapetracker.py
2023-02-07 06:17:55 -06:00
George Hotz c073271f20 more symbolic correctness 2023-02-07 00:03:14 -06:00
George Hotz e961fd3a04 more symbolic test, ModNode is wrong 2023-02-06 23:43:21 -06:00
George Hotz 8cfeb118d6 symbolic new test 2023-02-06 23:27:26 -06:00
George Hotz 7c5a5ecdac even simpler symbolic 2023-02-06 22:47:00 -06:00
George Hotz 8b05de1841 symbolic cleanups 2023-02-06 22:12:11 -06:00
George Hotz 2a924e2b77 fix sz.sh for llvm 2023-02-06 15:36:05 -06:00
James Roberts 0d405fd5bc
Parallelize CI tests (#535) 2023-02-06 15:27:44 -06:00
Andrey 4977d6f225
using tuples in isinstance (#534) 2023-02-06 14:40:26 -06:00
timmermansjoy d56c57b112
adding more robust install method (#532) 2023-02-06 13:12:05 -06:00
George Hotz fd3807c479 delete cherry and old cuda accel, promote llvm 2023-02-06 10:02:41 -06:00
George Hotz 90529d3750
tests are 20% faster (#529)
* pytorch CPU

* no cache, it's slower

* pytorch cpu for real

* remove double onnx
2023-02-06 09:56:14 -06:00
George Hotz 039de1b332 oops, pytest is for testing 2023-02-06 09:30:12 -06:00
George Hotz 6eb0e6a650 shuffle deps: always tqdm, make linting category 2023-02-06 09:27:01 -06:00
George Hotz 1d80639646 make linter test install testing deps 2023-02-06 09:21:48 -06:00
George Hotz 60bb64811c merge mypy into linters, no useless package update 2023-02-06 09:14:00 -06:00
George Hotz c3d81bba2a test_train: Adam -> SGD 2023-02-06 08:55:41 -06:00
George Hotz 36c26a57b1 make slow LLVM opt optional 2023-02-05 20:24:12 -06:00
George Hotz f7291f6ca3
fixes big KOPT, breaks opencl (#505)
* fixes big KOPT, breaks opencl

* fix optimizer

* KernelCache

* oops, broke batchnorm

* hack to fix it

* fix llvm, less hacky gpu

* disable the cache

* cache just breaks things
2023-02-05 10:46:17 -08:00
Martin Loretz 97f0a82be7
Cache pip packages in github actions (#522)
* Cache pip dependencies in github actions

* Add setup.py as cache-dependency-path

* Test caching

* Test caching

* Upgrade setup python action

* Test caching

* Remove setup.py from cache-dependency-path

* Don't remove cache-dependency-path

* Don't cache linter package's

* Test caching

* Test caching

* Test caching

* Upgrade actions/checkout to v3
2023-02-03 20:04:20 -08:00
Martin Loretz 4ad67b4bbc
Refactor triton buffer to use CLBuffer of cuda runtime (#524)
* Refactor triton buffer to use CLBuffer of runtime

* Fix opencl GT0
2023-02-03 20:02:41 -08:00
Jacky Lee ad4f6aa2cf
Add test for quick_gelu (#526)
* Add test for quick_gelu

* Bump PyTorch version for approximate
2023-02-03 20:01:39 -08:00
James Roberts db0a9b0a2d
Refactor CL.time_sum into GlobalCounters (#519) 2023-02-01 20:13:56 -08:00
Martin Loretz 45e847d284
Update triton to work in master (#517)
* Update triton to work in master

* Move mem_estimate out of runner
2023-02-01 12:58:14 -08:00
George Hotz 5e37f084db stable diffusion: clean up constant folding 2023-02-01 12:53:16 -08:00
George Hotz 175c38d1b3 triton: it already was GT0 2023-02-01 12:00:33 -08:00
Jacky Lee 486f023e81
Rename Normalize and move to nn (#513)
* Rename Normalize and move to nn

* Match PyTorch for dim>1
2023-02-01 11:55:03 -08:00
George Hotz cd97b036cc
A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
George Hotz 4e24002bbe no generic exceptions 2023-02-01 11:14:37 -08:00
Jacky Lee 54c68defc7
Replace SIGN with GT0 (#511)
* Replace sign with gt0

* Replace sign with gt0

* GT0 works on GPU

* Fix brackets

---------

Co-authored-by: Tom Finet <tom.codeninja@gmail.com>
2023-02-01 11:01:39 -08:00
Jacky Lee 799b3f185a
Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz d91b6711ea oops, broke BN 2023-01-31 08:18:48 -08:00
George Hotz 21f2af08d5 getenv + graphing 2023-01-30 19:15:03 -08:00
Jacky Lee 491e78d203
Add symbolic tests for correctness (#494)
* [WIP] Add symbolic tests for correctness

* Fix typo

* Fix expected value for test_and_fold

* Add more tests for symbolic

* It is indeed right

* Clean up

* Check all strings

* Put TODO back
2023-01-30 18:40:16 -08:00