* handle reshape of contiguous subparts with explicit mask
* remove the add/remove ones logic in reshape
* accomodate ones in accumulate logic
* make multiply commutative
* fix linting
* make mypy happy
* add test for commutative mul
* merge dimensions in shape_strides for 1 range masks
* add offsets for merging
* fix linting
* add back explicit 1 reshapes
* fix mypy errors
* fix accumulate by includng state
* include non-zero stride dimension in acc
* small cleanup
* more compact to_shape_strides
* more logical cleanup
* compress more
* compress reshape mask
* adding some comments
* small bug fix
* improve test coverage
* remove explicit add remove ones
* small bug in test
* enable test_reshape_splitting_combining
* small fix
* 10 lines less to_shape_strides
* shorten reshape mask
* some more cleanup
* more cleanup
* introduce some symbols for compactness
* more symbols
* more cleaner
* lessen symbols, it became less readable
* remove merge_views from view.reshape
* change to_shape_strides to _merge_dims
* improve readability
* fix corner case
* cleanup
* better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10)
* rewrite _reshape_mask for readability
* fix white space
* add comment
* nice shorthands for readability
* add proof in docs
* small nit
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* cpu tests pass
* torch works
* works
* metal works
* fix ops_disk
* metal jit works
* fix openpilot
* llvm and clang work
* fix webgpu
* docs are rly broken
* LRU works on metal
* delete comment
* revert name to ._buf. LRU only on Compiled
* changes
* allocator
* allocator, getting closer
* lru alloc
* LRUAllocator
* all pass
* metal
* cuda
* test examples
* linearizer
* test fixes
* fix custom + clean realize
* fix hip
* skip tests
* fix tests
* fix size=0
* fix MOCKHIP
* fix thneed
* copy better
* simple
* old style metal copy
* fix thneed
* np reshape
* give cuda a device