* Use generators in any(..) instead of lists for better best-case
* Use generators in all(...) instead of lists
* enable R1729 in .pylintrc
* revert import sorting
---------
Co-authored-by: Anselm Coogan <anselm@scandit.com>
* perf: remove extra function, include in cached getitem
* perf: only calculate hash once per node
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
* new upcast works
* float4 try
* fix unaligned float4
* disallow unaligned access
* upcast dim
* maybe good now
* fix gpu half
* vstore_half4
* fix deep image bugs
* improve symbolic to fix issues
* fix symbolic
* cl test
* this maybe
* gcd of 1 is 1
* real fix for old python
* improve fuzzer
* optimizations in symbolic.py
* fix infinite recursion when expanding sums
* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
* no zeroview start
* closer
* stride mask
* st tests pass, delete ZeroView
* byebye zv
* close to working
* not contiguous with mask
* subtract, don't add
* mask on view
* ugh, that shouldn't have been in there
* shape merge
* bugfixes
* fuzzer + 4 fuzzer failures
* fuzzer for symbolic
* more fuzzing and nothing
* that fuzzer doesn't hit either
* fixes padding...ugh
* no more offsets
* working
* rewrite load and store
* all checks
* fix idxs
* progress
* bugfix
* float4_axis
* works
* cleanups
* complex valids_okay
* linearizer outputs something
* working ish
* cstyle codegen
* clang mostly works
* fix load valid
* fix numberless loop
* fancy gen
* working
* fix enet compiler
* cleanups
* float4 upcasting
* less lines
* supports_float4
* constant folding
* mulacc
* internet tests flaky in CI
* 90% image support
* fix image generic
* bugs exposed with shapetracker and single view
* new llvm
* use vload, remove OLD
* that's really poorly done
* ending up being more lines
* runs one metal kernel
* conv2d works
* ops tests are passing
* const folding
* all ops work
* pre commit always passes
* torch works
* working still
* fix graph test
* tests passing
* image almost works
* image conv works
* most images
* fix custom
* fix assignment
* fix compile enet
* clean up comments
* fix realize return value
* include shapetracker in LB repr
* copy should make a copy
* reenable method cache
* fix lna
* dtypes in graph
* forward only for IMAGE=2
* simple realize
* getting close
* fixup new api, it's good except the kernel count
* back to 197 kernels
* tests should pass
* go to a real float
* no type_on_cpu
* fix the docs
* put shapetracker back in it's proper place
* building shapetracker
* default ENABLE_METHOD_CACHE
* symbolic compiles
* improve types
* tensor compiles
* oops, that's a bug
* best of both worlds
* find legit typing bugs
* pad2d can take list or tuple
* sub 200ms when compiled
* clean up opt
* don't let global kernels get too small
* 8192 -> 1024
* disable local shape for clang
* fix can_merge
* unroll the 5x5 depthwise convs in op
* load float4 check
* behavior is correct without VALIDHACKS
* simple div and mod
* fix tests
* no negative variables
* alt form is correct
* still correct
* bug in mulnode
* at least validhacks works now
* cleanups
* test validhacks, and to_image_idx
* cache compare key
* tests and __neg__
* remove val expansion
* types for all shapetracker functions:
* more typing
* add all the parens to the test
* more types
* fix tests
* very minor speedup
* we typing
* types look good in theory
* most tests pass
* gpu tests pass
* TEST_AST
* delete comments
* i must have written that bug so many times
* bugfix
* don't merge the small ones
* add f to constants
* commits from reduce
* don't GCD the mod nodes
* broken and a hack IMAGE=3
* group for reduce
* fix linter + mypy
* move out test ast
* insource TENSOR_TYPE_TO_NP_TYPE
* does this fix it?
* move imports out