1
0
Fork 0
Commit graph

144 commits

Author SHA1 Message Date
Jacky Lee ad4f6aa2cf
Add test for quick_gelu (#526)
* Add test for quick_gelu

* Bump PyTorch version for approximate
2023-02-03 20:01:39 -08:00
George Hotz cd97b036cc
A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
Jacky Lee 799b3f185a
Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
George Hotz bd8a5c2ced
Simple CUDA Runtime (#480)
* factor out opencl runtime

* don't use CL outside the runtime

* cuda runtime adds

* final_dimension

* tests pass with CUDA backend

* more cuda

* cuda simpler

* retain old functionality

* linter and typing

* move globalcounters out of runtimes

* oops, GlobalCounters in cuda

* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
George Hotz 9245f4650a indexer changes for master 2023-01-18 18:02:02 -08:00
George Hotz 49c6e6d472
Latest attempt to add image (#462)
* add image

* load + store + boring stuff:

* image tests pass

* thneed print GFLOPS

* op conv test

* more debugging

* hack for multiview image

* shapetracker creates less views

* disable image tests

* working better

* ugh, lkey not key

* print in DEBUG, and allow views

* works

* simple padding conv2d

* use index for image

* that was bad code

* debug print

* fix types

* less lines

* save lines
2023-01-12 17:36:30 -08:00
George Hotz 4885fce56e
shapetracker from newgpu (#456)
* shapetracker from newgpu

* touchup ops

* test

* testst

* thneed deletes unused inputs

* test

* bugfix
2023-01-09 12:40:01 -08:00
George Hotz 2cc1d970c6 updates from the chonker branch 2022-11-07 21:12:08 -08:00
George Hotz db2da22a04 stop blowing up floats 2022-10-30 16:47:16 -07:00
George Hotz 8afc643bb1 fix bug in ops test, it was cheating somehow 2022-10-30 16:43:24 -07:00
George Hotz 2f602a92ff seperate STRIDED and EXPAND 2022-10-30 13:23:58 -07:00
George Hotz 52bfbc31be vectorization 2022-10-29 12:47:52 -07:00
George Hotz e473d35f90 llvm doesn't vectorize 2022-10-29 11:59:48 -07:00
George Hotz b65b70812a
Exec AST (#404)
* working exec ast

* exec_ast is staticmethod

* GenericExecAST

* fold that sometimes

* ExplicitExecAST

* exec_ast for GPU

* gpu working

* get_lazyop_shape

* now gpubuffer is ExplicitExecAST

* dedup

* add a type

* RESHAPE in opencl code

* fix linter

* that too for linter

* cleanups

* remove dead code

* GenericShape is less lines

* add ALLOWED_KERNEL_COUNT to tests

* fix mypy

* that's gotta be recursive

* fix opencl shape processing

* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz 10921a60c4 more imports from llvm branch 2022-10-26 18:02:36 -07:00
Drew Hintz a4ad1d774a
enable tests in test_ops.py that are disabled but now work. (#396)
remove custom tolerances that don't appear to be needed.
2022-10-13 09:58:53 -07:00
George Hotz b7f748c15a
Fix GPU 2**31 virtual size limit (#392)
* in progress

* big conv test works

* that's unneeded

* fix opencl with reduce

* rewrite contiguous_view_constant_fold

* clean up mids in loop code

* subidx

* print cl kernel before run

* no reduce, no loop

* Revert "no reduce, no loop"

This reverts commit 92777e40e9.
2022-10-05 00:55:20 -04:00
George Hotz 7a61dc7ee9 test_sd_big_conv 2022-10-01 13:26:05 -04:00
George Hotz 271446e3eb
set requires_grad to None (#387)
* set requires_grad to None

* some things need gradients

* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
George Hotz 29ae21bb0d import tests from CL metal texture fix 2022-09-19 20:01:47 -04:00
George Hotz 57e804a9bf add min support 2022-09-18 20:39:41 -04:00
George Hotz 3c3534736e fix matmul kernel and tests 2022-09-13 08:31:04 -07:00
Comma Device 62e9419206 fix test failure on MATMUL=1 backward pass 2022-09-13 11:18:52 -04:00
Comma Device 3b82afc6a0 simple on device failing test 2022-09-13 10:59:15 -04:00
George Hotz 4efde1ba0a test_matmul 2022-09-13 07:51:33 -07:00
George Hotz 790af99a48 fix slice one multi, and linear can be simpler with new broadcasting 2022-09-06 19:51:33 -07:00
YassineYousfi 5aad460c7a
broadcast from right to left (#375)
* broadcast from right to left

* add another broadcasted add test
2022-09-06 16:36:13 -07:00
George Hotz bcb867cdd6 better idea for numbers, do the division in python 2022-09-03 16:23:39 -07:00
George Hotz 033a3ecccf found tinygrad bug 2022-09-03 12:32:43 -07:00
George Hotz 5d45c6e516
Fold reduce (#362)
* folding reduce

* fold through movementops

* fixup shapes

* was too aggressive

* i knew we needed that

* don't recompute reduce

* working

* fix openpilot compile

* prunegraph openpilot

* types and reduce_shape

* refactor

* cleanups

* neater

* 1009

* 1004

* clean up reduce for 998
2022-07-19 09:24:02 -07:00
George Hotz f93e297804 fix bug caused by rounding 2022-07-17 12:49:58 -07:00
George Hotz bcf422dfdd
Device2 (#358)
* option for matmul

* fixups

* fast like a nascar

* running

* thneed runner

* no buffer id makes no backing buffer

* move constant folding to the top

* runs on mac

* folded biases

* was v slow

* maybe just that

* elu touchup

* speed and float32

Co-authored-by: Comma Device <device@comma.ai>
2022-07-16 07:26:19 -07:00
George Hotz 5e46561f7e no_grad = NOT backward 2022-07-10 20:54:57 -07:00
George Hotz b34ae7876f lol chr(10) not chr(13) 2022-07-10 20:03:11 -07:00
George Hotz 93c378dffc add test for slice_one 2022-07-03 12:14:20 -07:00
George Hotz dffde3de5a support both asymmetric and negative padding 2022-06-26 17:59:25 -07:00
George Hotz 49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz 8c483fbdc9 maxpool lazy fix 2022-06-26 17:07:03 -07:00
George Hotz 6b652dafb2 touchups 2022-06-19 16:57:14 -07:00
George Hotz d5b3e18540
Accelerate with CL (#325)
* accelerated opencl

* it's running, it's just wrong

* bugfix

* model is correct in opencl

* lazy image convert

* add padding support to convolution

* that stuff was all upstreamed

* remove HEAD

* oops

* test_simple_conv2d_4 passes, add dilation support

* put logic in ops_opencl

* fix crash

* hmm, stride seems okay

* padding for batched inputs

* just an issue now with cout%4

* op model still passes

* fix startPackedInputChannel

* pre and post processing ops for graph

* don't break other llops

* shapetrackering

* reshapes are free

* lazy movement ops
2022-06-16 15:40:52 -07:00
George Hotz 2a14befb74 support padding 2022-06-15 14:46:44 -07:00
George Hotz fef6c82491 wow dilation support was simple 2022-06-15 11:38:23 -07:00
George Hotz 0b182029dd support dilated convolution in torch 2022-06-14 18:03:35 -07:00
George Hotz a690ba4588 add test for padding 2022-06-14 17:41:22 -07:00
George Hotz e057ca23bb add flip 2022-06-14 17:28:43 -07:00
George Hotz dcbca4fdf1
Expand Operator (#327)
* replace broadcasting with expand

* Tensor, not self

* remove broadcasting from mlops

* delete useless A operator

* expand, not repeat

* remove A op

* expand on gpu

* binary_op doesn't broadcast anymore

* expand is still total junk, but the tests should pass
2022-06-12 12:31:48 -07:00
George Hotz 33f18c61a1 test_broadcasted_add 2022-06-12 10:19:58 -07:00
George Hotz 85d17a2acd running resnet onnx 2022-06-11 13:17:15 -07:00
George Hotz db5a632e8c multicat + test onnx is generic onnx 2022-06-11 11:50:47 -07:00