1
0
Fork 0
Commit graph

1401 commits

Author SHA1 Message Date
George Hotz 8db345d846 functools.partialmethod -> lambda fixes Python 3.11 2023-01-25 18:08:38 -08:00
calledit a0af1045bf
Some new tests (#440)
* Make test run

* Added new tests: sub pow constant_sub

* Fix indentation

* Added one to many lines

* Fix indentation

* Update test_cl_tiler.py

* Delete test_cl_tiler.py
2023-01-25 15:40:19 -08:00
George Hotz aafc29484a cleanups 2023-01-25 12:37:10 -08:00
George Hotz 919e943867 decent search 2023-01-25 12:20:53 -08:00
George Hotz 7f3da91f8b kernel_search 2023-01-25 12:05:09 -08:00
George Hotz e37424424f first little attempt at search 2023-01-25 11:49:29 -08:00
George Hotz c15e9c3c7a comment where future perf should go 2023-01-25 11:13:57 -08:00
George Hotz ee1f6ab3ca flip output shape extra dimension indexing for speed 2023-01-25 11:00:37 -08:00
George Hotz 335a261a2e test for slow kernel 2023-01-25 10:25:22 -08:00
George Hotz 0d594ccc51 mps option in torch (note: it's broken) 2023-01-25 10:10:39 -08:00
George Hotz 66da3bc3c0 reset the benchmark timer 2023-01-25 09:20:34 -08:00
George Hotz f5be4043ac fix OSX CL kernel timing 2023-01-25 08:37:18 -08:00
George Hotz f6fc2a0d98 huh, this prevents an extra kernel 2023-01-25 07:53:35 -08:00
George Hotz 487685919b
Revert "Rename Normalize and move to nn (#415)" (#474)
This reverts commit d768acb6a9.
2023-01-25 07:50:04 -08:00
Jacky Lee d768acb6a9
Rename Normalize and move to nn (#415)
* Rename Normalize and move to nn

* Fix comparison to None error

* Add test for GroupNorm

* Rename test case

* Flip parameters to match PyTorch

* Increase error tolerance

* Fix elementwise_affine on channels

* Match arguments with PyTorch

* Initialize weight and bias only when affine is true

* Is this it?

* A bit cleaner

* Handle case where weight or bias is None
2023-01-25 07:47:59 -08:00
George Hotz baf64c14ac cleanups, simple padding in the processing op 2023-01-25 07:37:52 -08:00
George Hotz 3acf62d489 cleanups for IMAGE=2 conv 2023-01-25 07:18:34 -08:00
George Hotz 6d7658db12 delete opencl <celebration> 2023-01-24 14:18:35 -08:00
George Hotz e313c8af20 update openpilot tests from OPENCL to GPU 2023-01-24 14:05:20 -08:00
George Hotz 2e1d47b166 there's a bug in scc for empty string 2023-01-24 12:06:06 -08:00
George Hotz e9c293361b fix typo 2023-01-24 12:03:58 -08:00
Comma Device 9e2af0a972 too far with the OPTWG 2023-01-24 13:14:59 -06:00
Comma Device 3590848b93 a little more local workgroup options 2023-01-24 12:50:27 -06:00
Comma Device 4b74752c42 fix hotspots by improving the workgroup optimizer 2023-01-24 12:46:28 -06:00
George Hotz fd760a390a fix incremental time 2023-01-24 10:19:04 -08:00
George Hotz 7a369b856b nope, no default NATIVE_EXPLOG 2023-01-24 10:01:52 -08:00
George Hotz 78fedc13d1 native_explog is default 2023-01-24 08:09:43 -08:00
George Hotz 5d350d4883 the ast test is actually a test now 2023-01-24 07:53:24 -08:00
George Hotz 7a159b9b04 tinygrad got big...make it tiny again 2023-01-23 21:33:56 -08:00
George Hotz 6286ace4f1
does this work yet (#471) 2023-01-23 20:36:17 -08:00
George Hotz c22554f44a floats for nvidia 2023-01-23 16:36:10 -08:00
George Hotz 6fe9edf30f torch cuda is very fast 2023-01-23 16:24:46 -08:00
George Hotz a949de873b
reduce 2.0 (#469)
* reduce 2.0

* works

* hacks

* DEBUG=3 for shapes

* fix types

* 0s weren't being folded

* cleaner

* last_reduce is no longer needed

* comments and cleanup
2023-01-23 15:11:13 -08:00
George Hotz a6de94b444 test partial sum 2023-01-22 21:28:40 -08:00
George Hotz f1196984e6 harmless to intertwine the math and the stores 2023-01-21 09:31:56 -08:00
George Hotz 708215d06b
Typing (#468)
* we typing

* types look good in theory

* most tests pass

* gpu tests pass

* TEST_AST

* delete comments

* i must have written that bug so many times

* bugfix

* don't merge the small ones

* add f to constants

* commits from reduce

* don't GCD the mod nodes

* broken and a hack IMAGE=3

* group for reduce

* fix linter + mypy

* move out test ast

* insource TENSOR_TYPE_TO_NP_TYPE

* does this fix it?

* move imports out
2023-01-21 09:09:22 -08:00
George Hotz b29614592a first conv/second conv 2023-01-19 13:26:11 -08:00
George Hotz 3d697577b2 print_ast 2023-01-19 13:22:03 -08:00
George Hotz 325a440cb5 pass in op_estimate in opencl 2023-01-19 11:02:23 -08:00
George Hotz 844c645834 add flops for processing op 2023-01-19 10:58:44 -08:00
George Hotz 0881d504c1
move shapetracker (#466)
* move shapetracker

* shapetracker test

* move ast

* move a few things

* fix print kernel

* fix test

* symbolic fixups
2023-01-19 09:56:31 -08:00
George Hotz 2b47ee401f
Symbolic for indexes (#464)
* indexer

* works

* all use indexer

* boolean in the indexer too

* symbolic is a better name than indexer

* better symbolic API

* min and max

* symbolic tests

* work

* more tests

* fix demodder

* __str__ in the superclass

* NumNode

* awesome that works

* still works

* fix up parens

* fix zeroviews

* dead lines

* expr_node

* works

* still works

* refactor to not use __new__ methods

* ugh something went wrong a while ago

* this fixes it

* mod and div at the end

* test

* symbolic

* working

* one linter issue fixed

* other division

* more simplifys

* works

* validhacks

* VALIDHACKS passes thneed

* no str replace stuff

* inline indexes

* NATIVE_EXPLOG and factoring

* factor both ways

* cl indexing

* split on mod, not just full

* onnxlimit

* fix output shape

* op_estimate is a function of the program

* no ones in the index

* four_float4

* ALLOW_4FLOAT4

* test passes

* compute then store

* loads first

* bugfix

* better, but doesn't match

* select xb in smart way

* new test and bugfix

* no change to lazy

* Node fixes linter

* fix opencl with op_estimate

* fix mypy

* revert valid

* remove unused
2023-01-19 07:21:30 -08:00
George Hotz 3a3400e3a2 more from indexer 2023-01-18 18:11:51 -08:00
George Hotz 9245f4650a indexer changes for master 2023-01-18 18:02:02 -08:00
George Hotz 15d04f13ce expr_idxs 2023-01-15 10:41:59 -08:00
George Hotz 70b771a175 idx idy 2023-01-15 09:39:22 -08:00
George Hotz 7ea89779fa add returns between views 2023-01-15 08:58:10 -08:00
George Hotz 287699c32c simplify ones after axis splitting 2023-01-14 10:51:43 -08:00
George Hotz 1b5def5b9d flip image x/y to match OPENCL 2023-01-12 17:45:37 -08:00
George Hotz 49c6e6d472
Latest attempt to add image (#462)
* add image

* load + store + boring stuff:

* image tests pass

* thneed print GFLOPS

* op conv test

* more debugging

* hack for multiview image

* shapetracker creates less views

* disable image tests

* working better

* ugh, lkey not key

* print in DEBUG, and allow views

* works

* simple padding conv2d

* use index for image

* that was bad code

* debug print

* fix types

* less lines

* save lines
2023-01-12 17:36:30 -08:00