1
0
Fork 0
Commit Graph

45 Commits (8f2e2f5ee2c9fe8dd6d7d3b64375484d415a5b0d)

Author SHA1 Message Date
Roelof van Dijk 8f2e2f5ee2
style: else-after-return (#1216)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-07-12 10:26:38 -07:00
George Hotz 67e34b356a
good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
George Hotz 793a670187
from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
Anselm Coogan a22aad7d32
Use generators instead of lists in `any`s and `all`s (#1111)
* Use generators in any(..) instead of lists for better best-case

* Use generators in all(...) instead of lists

* enable R1729 in .pylintrc

* revert import sorting

---------

Co-authored-by: Anselm Coogan <anselm@scandit.com>
2023-07-03 16:06:06 -07:00
Roelof van Dijk 542b2d93a5
Perf/cache string ops (#1078)
* perf: remove extra function, include in cached getitem

* perf: only calculate hash once per node

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-29 13:23:11 -07:00
George Hotz d16c16ec28
new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
George Hotz c8d87eb8d4 strip whitespace 2023-06-27 10:11:43 -07:00
Roelof van Dijk c604ef4beb
symbolic.py: faster Node.sum, faster SumNode.div (#1014)
* refactor: replace isinstance with class check where possible

* refactor: faster partition

* fix; flake8

* feat: rework node.sum, correct list typing

* fix: typo

* feat: refactor sum

* fix: pylint

* refactor: simpler sum and factorize

* feat; clean up sumnode div, all cpu tests pass

* feat: simplify floordiv, cache factorization

* don't factor numnodes at all

* python 3.8 functools does not yet have @cache

* fix: restore assert

* refactor, fix failing tests

* fix: address review comments

* feat: rework, add specialization, remove cache

* fix: remove specialization

* feat: no tuple conversion, faster loop

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:47:17 -07:00
George Hotz ba4eadb04c
PTX assembly support (#977)
* ptx assembly

* all ops tests pass

* fix tests
2023-06-13 12:31:42 -07:00
George Hotz c62c64f0b7
remove GeNode (#965) 2023-06-09 21:48:56 -07:00
Rayan Hatout 8b2c2d6896
Optimizations in `symbolic.py` (#796)
* optimizations in symbolic.py

* fix infinite recursion when expanding sums

* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
2023-05-26 12:59:53 -07:00
George Hotz 8b7ecd63bb
Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
George Hotz 5495c7d64e
linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
Cyril Roumégous 3f08613a2a
apply flake8 E203 rule (#684) 2023-03-11 11:35:16 -08:00
George Hotz f3ac52aee8
Mypyc (#680)
* building shapetracker

* default ENABLE_METHOD_CACHE

* symbolic compiles

* improve types

* tensor compiles

* oops, that's a bug

* best of both worlds

* find legit typing bugs

* pad2d can take list or tuple

* sub 200ms when compiled
2023-03-11 07:33:30 -08:00
George Hotz 22905dd657 speedups from llama branch 2023-03-10 22:01:32 -08:00
George Hotz fb5ee9260f add pad tests to shapetracker 2023-03-09 12:51:18 -08:00
George Hotz 382f346523
clean up opt (#649)
* clean up opt

* don't let global kernels get too small

* 8192 -> 1024

* disable local shape for clang

* fix can_merge

* unroll the 5x5 depthwise convs in op

* load float4 check
2023-03-05 20:49:36 -08:00
Cyril Roumégous c10131ddf5
reduce number of lines (#645) 2023-03-05 15:42:32 -08:00
George Hotz b5b4edf59b comments 2023-03-03 22:39:31 -08:00
George Hotz cfb050e2d1 simple modrange, thanks Jacky 2023-03-03 22:37:04 -08:00
George Hotz 7a1d96fd76
No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz b9ce20c374 openpilot test wasn't running, factor out image idx 2023-03-03 07:41:53 -08:00
George Hotz 3915c89fb6
symbolic improvements (#629)
* fixups

* shorter diff

* wow, okay removing that had side effects

* more numeric tests

* MIN MAX tests
2023-03-02 19:50:38 -08:00
George Hotz 28f52f7c24 improve symbolic 2023-02-28 16:21:58 -08:00
George Hotz 1702a5779f remove hacks from can_merge 2023-02-28 15:30:20 -08:00
George Hotz e21df1701b distribute + refactor merge_views 2023-02-28 14:57:56 -08:00
George Hotz 8478a61cdb simplify in shapetracker 2023-02-28 00:35:26 -08:00
George Hotz f3386c7f09 improve symbolic, hlop conv output is simple now 2023-02-24 22:20:40 -08:00
George Hotz 446442dbb3 fix tests symbolic 2023-02-11 15:16:47 -08:00
George Hotz 7a7046f264 sum_combine_num 2023-02-11 14:48:31 -08:00
George Hotz 87a7717222 LLVM backend uses shapetracker 2023-02-10 13:53:33 -06:00
George Hotz c3cf17c6d0
Symbolic render (#550)
* render symbolic

* valid

* fix shapetracker tests

* render_python is the default

* expr is gone

* remove legacy behavior
2023-02-10 13:22:26 -06:00
George Hotz aebe75d9a2
remove val expansion (#539)
* remove val expansion

* types for all shapetracker functions:

* more typing

* add all the parens to the test

* more types

* fix tests

* very minor speedup
2023-02-07 15:14:05 -06:00
George Hotz c073271f20 more symbolic correctness 2023-02-07 00:03:14 -06:00
George Hotz e961fd3a04 more symbolic test, ModNode is wrong 2023-02-06 23:43:21 -06:00
George Hotz 8cfeb118d6 symbolic new test 2023-02-06 23:27:26 -06:00
George Hotz 7c5a5ecdac even simpler symbolic 2023-02-06 22:47:00 -06:00
George Hotz 8b05de1841 symbolic cleanups 2023-02-06 22:12:11 -06:00
Andrey 4977d6f225
using tuples in isinstance (#534) 2023-02-06 14:40:26 -06:00
George Hotz b1dec64815 new types and fixup ShapeTracker type mismatches 2023-01-25 19:39:36 -08:00
George Hotz 708215d06b
Typing (#468)
* we typing

* types look good in theory

* most tests pass

* gpu tests pass

* TEST_AST

* delete comments

* i must have written that bug so many times

* bugfix

* don't merge the small ones

* add f to constants

* commits from reduce

* don't GCD the mod nodes

* broken and a hack IMAGE=3

* group for reduce

* fix linter + mypy

* move out test ast

* insource TENSOR_TYPE_TO_NP_TYPE

* does this fix it?

* move imports out
2023-01-21 09:09:22 -08:00
George Hotz 0881d504c1
move shapetracker (#466)
* move shapetracker

* shapetracker test

* move ast

* move a few things

* fix print kernel

* fix test

* symbolic fixups
2023-01-19 09:56:31 -08:00