1
0
Fork 0
Commit graph

1401 commits

Author SHA1 Message Date
George Hotz cfdf803b52 fix llvm vectorization by add analysis passes from the target machine 2022-10-30 15:28:36 -07:00
George Hotz 2f602a92ff seperate STRIDED and EXPAND 2022-10-30 13:23:58 -07:00
George Hotz 544cb0a069 oops, remove while(1) 2022-10-29 14:05:13 -07:00
George Hotz 4b6097f81d more amx notes 2022-10-29 14:04:10 -07:00
George Hotz fdb43fe553 gemm is 1.7 TFLOPS on a single M1 core 2022-10-29 13:42:33 -07:00
George Hotz 52bfbc31be vectorization 2022-10-29 12:47:52 -07:00
George Hotz e473d35f90 llvm doesn't vectorize 2022-10-29 11:59:48 -07:00
George Hotz 86eb06eb76 accurate flop estimation 2022-10-28 19:13:20 -07:00
George Hotz 7909786dbf one more opt test 2022-10-28 18:37:53 -07:00
George Hotz dd543fbc7a MovementOps is unused 2022-10-28 18:26:08 -07:00
George Hotz 71b336503f no RESHAPEs in the AST 2022-10-28 18:25:30 -07:00
George Hotz 294ab9e2f8 more test opt 2022-10-28 18:04:12 -07:00
George Hotz f885ceb695 test speed w/o bias 2022-10-28 11:22:15 -07:00
George Hotz 3735e26492 very minor 2022-10-28 09:39:30 -07:00
George Hotz c0050fab8f clean up movement_op in cpu and torch 2022-10-28 09:29:12 -07:00
George Hotz df31dde174 hasattr and DeviceBuffer type fixups 2022-10-28 09:05:45 -07:00
George Hotz e6b65f8e01 fix graph in openpilot/compile.py 2022-10-28 08:55:34 -07:00
George Hotz 1013540370 fix flake8 2022-10-28 08:52:53 -07:00
George Hotz 804b2dd001 move into graph.py 2022-10-28 08:50:11 -07:00
George Hotz 8517b69bfb lazy cleanups 2022-10-28 08:43:43 -07:00
George Hotz d02f8f9bc0 can we lose the lines with E701 still there? 2022-10-28 08:36:03 -07:00
George Hotz ef62db3186 cleanups, remove E701 2022-10-28 08:28:56 -07:00
George Hotz b65b70812a
Exec AST (#404)
* working exec ast

* exec_ast is staticmethod

* GenericExecAST

* fold that sometimes

* ExplicitExecAST

* exec_ast for GPU

* gpu working

* get_lazyop_shape

* now gpubuffer is ExplicitExecAST

* dedup

* add a type

* RESHAPE in opencl code

* fix linter

* that too for linter

* cleanups

* remove dead code

* GenericShape is less lines

* add ALLOWED_KERNEL_COUNT to tests

* fix mypy

* that's gotta be recursive

* fix opencl shape processing

* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz 6a15fd3844
LLVM Backend take 2 (#403)
* take 2 llvm

* get_lazybuffers -> get_buffers

* llvm tests pass

* fix type issues and refactor LLVM
2022-10-26 20:32:31 -07:00
George Hotz 10921a60c4 more imports from llvm branch 2022-10-26 18:02:36 -07:00
George Hotz 463995e64f relu simpler backward pass 2022-10-26 17:57:32 -07:00
George Hotz 6a8fb53304
move ops.py into lazy.py (#402)
* move ops.py into lazy.py

* fix graph and linter

* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz 8e22d5ee67 replace networkx with defaultdict 2022-10-20 19:36:43 -07:00
George Hotz 3b9b7eda48 remove run_thneed dead code 2022-10-20 17:24:18 -07:00
George Hotz 63f9c55156 really dumb bug 2022-10-20 17:07:47 -07:00
George Hotz 1bec4651b3 fix nonstatic weights 2022-10-20 17:04:14 -07:00
George Hotz 59143bbb3b raise, don't assert 2022-10-20 16:32:34 -07:00
George Hotz 9f8c414589 might fix tests 2022-10-20 16:27:11 -07:00
George Hotz fd6ba8e7ac don't recopy backing 2022-10-20 16:06:11 -07:00
George Hotz 62affbd9ce add CONTIGUOUS loadop 2022-10-20 15:55:19 -07:00
George Hotz bb288e6938 safe_numpy and warning for broken matmul 2022-10-20 15:40:22 -07:00
George Hotz 50c95c7d9a add assert to catch issue in attention 2022-10-20 15:13:00 -07:00
George Hotz 26c78ccf7d remove useless buffer 2022-10-20 14:07:28 -07:00
George Hotz a18c1f3178 zero out the inputs 2022-10-20 13:46:52 -07:00
George Hotz 61ee428e4c rerun 2022-10-20 13:29:14 -07:00
George Hotz 5dae64b7b0 read input shapes and break down the layers 2022-10-20 13:11:24 -07:00
George Hotz e00601faea fix thneed self test 2022-10-20 12:55:02 -07:00
George Hotz ace8db29f8 ReduceSum 2022-10-20 12:48:14 -07:00
George Hotz c400ee0beb
refactoring thneed (#400)
* refactoring thneed

* continue

* minor update

* looks like it's working

* big refactor

* confirm thneed got the right output

* code is there but it's broken

* works now

* always OPTWG, input -> dat

* fix type issue
2022-10-20 12:35:59 -07:00
George Hotz 0514594083 fix openpilot test 2022-10-20 11:56:26 -07:00
YassineYousfi ae0f9b17df
openpilot: new models and onnx ops (#401)
* ngrl stuff

* fngrl

* fix typo in compile script

* workflow dispatch

* new models in tests

* dont need to up this threshold

Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
2022-10-20 11:49:19 -07:00
Drew Hintz a4ad1d774a
enable tests in test_ops.py that are disabled but now work. (#396)
remove custom tolerances that don't appear to be needed.
2022-10-13 09:58:53 -07:00
Drew Hintz 165fb4d631
remove redundant list comprehension from inside all. (#397)
remove explicit inherit from object.
2022-10-13 09:58:35 -07:00
George Hotz 793edf8900 touchup 2022-10-10 16:13:34 -07:00
George Hotz d54a45b50d measure speed vs torch 2022-10-10 16:06:00 -07:00