* mypy fun
* things are just faster
* running fast
* mypy is fast
* compile.sh
* no gpu hack
* refactor ops_cpu and ops_torch to not subclass
* make weak buffer work
* tensor works
* fix test failing
* cpu/torch cleanups
* no or operator on dict in python 3.8
* that was junk
* fix warnings
* comment and touchup
* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None
* Fixed pad2d
* mypy doesn't know about mlops methods
* normal python behavior for out-of-bounds slicing
* type: ignore
* inlined idxfix
* added comment for __getitem__
* Better comments, better tests, and fixed bug in np.newaxis
* remove val expansion
* types for all shapetracker functions:
* more typing
* add all the parens to the test
* more types
* fix tests
* very minor speedup
* refactor lazy to return ASTs
* a lil cleaner
* oops, compare ids
* gate on GRAPH
* cleanups
* less calls to log_op
* simpler
* realize_buffers -> map_buffers
* even simpler
* think in asts
* a lil cleaner
* NOOP means contiguous
* fixes big KOPT, breaks opencl
* fix optimizer
* KernelCache
* oops, broke batchnorm
* hack to fix it
* fix llvm, less hacky gpu
* disable the cache
* cache just breaks things
* triton can add
* print stuff from triton
* write out file
* ops triton working
* reduce ops
* sort of works
* Triton bugfixes & implementation of remaining ops (#490)
* padding
* support pow, max, relu, gt0
* allocate return buffer
* Fix reduce
* Add tests for power op
* Fix triton illegal memory accesses and memory leak (#512)
* Fix mypy issue
* Add triton to setup.py
* Replace torch with pycuda
* Use one cuda stream for data transfer and kernels
* Remove triton submodule
* Fix memory leak by using weakrefs for caching
* Fix memory access by adding valid as mask for load
* Fix invalid kernel launches by flattening the grid (#515)
---------
Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
* Refactor getenv into helpers
* Remove unused os
* Fix default value
* Fix more defaults for CI
* Fix bracket
* Revert changes to openpilot/compile.py
* Use getenv from helpers when possible
* [WIP] Add symbolic tests for correctness
* Fix typo
* Fix expected value for test_and_fold
* Add more tests for symbolic
* It is indeed right
* Clean up
* Check all strings
* Put TODO back