George Hotz
f4f23dc9a3
version bump
2023-05-26 00:51:25 +00:00
George Hotz
faf80418b7
pyopencl by default since GPU is default ( #802 )
2023-05-25 17:48:18 -07:00
wozeparrot
fca5028d78
feat: ability to exclude cl devices from being used ( #801 )
2023-05-25 17:31:29 -07:00
Benedikt
3c465470f2
pip installation one liner ( #793 )
2023-05-25 16:43:42 -07:00
George Hotz
a968c4c3a4
Cleanup mlperf ( #797 )
...
* improve factorization
* cleanups
2023-05-25 11:36:43 -07:00
Diogo
c19ef0fcce
Add sin/cos/tan ( #794 )
...
* added sin/cos/tan
* fix lint
* added onnx ops support
2023-05-25 09:04:56 -07:00
wozeparrot
01ae45a43c
Add mlperf RNN-T model ( #782 )
...
* feat: initial rnn-t
* feat: working with BS>1
* feat: add lstm test
* feat: test passing hidden
* clean: cleanup
* feat: specify start
* feat: way faster lstm & model
* fix: default batch size
* feat: optimization
* fix: fix metrics
* fix: fix feature splicing
* feat: cleaner stacktime
* clean: remove unused import
* clean: remove extra prints
* fix: fix tests and happy llvm
* feat: have the librispeech dataset in its own dir
* clean: unused variable
* feat: no longer need numpy for the embedding + slightly more memory efficient lstm
* fix: forgot to remove something that broke tests
* feat: use relative paths
* feat: even faster
* feat: remove pointless transposes in StackTime
* fix: correct forward
* feat: switch to soundfile for loading and fix some leaks
* feat: add comment about initial dataset setup
* feat: jit more things
* feat: default batch size back to 1
larger than 1 is broken again :(
and even in the reference implementation it gives worse results
2023-05-25 00:41:21 -07:00
Sasha Krassovsky
b258af117a
Fix PytestCollectionWarning when running tests ( #791 )
2023-05-24 23:17:57 -07:00
George Hotz
0400315078
Revert "ops rdna"
...
This reverts commit 81a11d891d
.
2023-05-21 13:02:18 -07:00
George Hotz
325a3bf2cf
Revert "writing 2"
...
This reverts commit dddd6c42f0
.
2023-05-21 13:02:17 -07:00
George Hotz
dddd6c42f0
writing 2
2023-05-21 12:52:36 -07:00
George Hotz
81a11d891d
ops rdna
2023-05-21 11:45:38 -07:00
George Hotz
ed038ba129
Contract float4 ALU operations ( #780 )
...
* wrong expand
* tests passing
* pass lint
2023-05-16 19:03:49 -07:00
George Hotz
90fff82c8a
Rdna ( #776 )
...
* assembler maybe
* custom asm
* rdna3 on quiet
* trigger crashes
* fixed notes
* non-fatal rdna2 crash
* Crash4
* improve rdna sniffer
* comments
* improve sniffer
* asm
* 131 TFLOPS RDNA3
* opt simple matmul
* todos
2023-05-16 05:33:57 -07:00
George Hotz
89b8b39d9c
fix mypy
2023-05-13 21:25:36 -07:00
George Hotz
e0b2035023
fast imagenet eval, gets 76.14% across the set
2023-05-13 21:18:31 -07:00
Jacky Lee
c552f6f92b
Inference test: add tests for ResNet50 ( #773 )
...
* Add ResNet inference test and cannon
* Test with ResNet50
* test_car works with resnet fix
2023-05-13 21:18:15 -07:00
Rabia Eda Yılmaz
e5b4b36cba
add std to tensor.py ( #767 )
...
* add std
* delete comment
* edit: one liner std, add: test
* adjust
* fix: shape mismatch
* set unbiased to False
* added unbiased option
* fix unbiased option in test and clean code
* better
* generalize axis
* holly coffee molly
* generalize axes without unbiased opt.
* hopefully done
* complete unbiased true for axes
* Update test_ops.py
* fixed
* std completed without bessels correction
* fix comment
* ups
2023-05-13 12:20:44 -07:00
George Hotz
b705510d5c
getting 77% on imagenet eval
2023-05-13 07:46:27 -07:00
George Hotz
810f03dafa
conv3d + unet3d ( #772 )
...
* conv3d, needs test
* test passes, padding wrong on unet
* unet3d
* no conv3d on images
2023-05-12 13:54:07 -07:00
George Hotz
46d419060b
start on mlperf models
2023-05-10 16:30:49 -07:00
Jacky Lee
d13629cb26
ResNet: match implementation with Nvidia and PyTorch ( #770 )
...
* Match ResNet implementation with pytorch and nvidia
* Reduce number of Epochs
2023-05-10 09:01:22 -07:00
Jacky Lee
b80cf9220c
Statistics test: check if distributions match torch ( #769 )
...
* Check if tensor values match torch
* Clean up randomness tests and remove dependency
* Remove kaiming uniform test
2023-05-07 21:43:23 -07:00
George Hotz
cb7c22beeb
fix mypy
2023-05-06 19:18:54 +00:00
George Hotz
5190037cbc
rocm: disassembler for shader
2023-05-06 19:07:52 +00:00
George Hotz
7fbf96b992
jit: TODO, use abstractions
2023-05-05 22:51:30 -07:00
George Hotz
0cd3feb452
jit oops. should add that to commit tests
2023-05-05 22:01:13 -07:00
George Hotz
5b2ae262db
assertions for jit
2023-05-05 21:56:32 -07:00
George Hotz
42256c0d9d
rocm sniffer dumps code
2023-05-05 18:36:53 +00:00
George Hotz
81aa3e546b
exclude GPU on tiny ( #766 )
2023-05-05 10:07:23 -07:00
George Hotz
f2a964f447
nocopy ( #764 )
2023-05-05 09:32:06 -07:00
George Hotz
466ffeb04f
fast cifar on AMD
2023-05-05 02:10:50 +00:00
George Hotz
3a2011ab2d
rocm sniffer
2023-05-04 22:22:39 +00:00
George Hotz
a55c4f5000
better rocm build scripts
2023-05-04 09:14:05 +00:00
George Hotz
987b1aaf96
rocm build scripts
2023-05-04 08:45:23 +00:00
George Hotz
f28df9900f
multidevice works ( #763 )
...
* basic multigpu working
* better multigpu test
* upper
* touchups
* cl sync
2023-05-04 01:04:58 -07:00
George Hotz
4f6d674ec0
use CPU tests in pre-commit
2023-05-03 19:46:16 +00:00
George Hotz
ed33a89d52
no werror in archprobe
2023-05-03 19:34:17 +00:00
George Hotz
7ecf4dff68
multi cl_queue ( #762 )
...
* multi cl_queue
* only platforms 1
* gpus first, then cpus
* put device on underlying buffer
* cl_queue array
2023-05-03 12:15:28 -07:00
Rylan Justice
7757f5fed2
Fixed package description ( #761 )
...
* Updated LICENSE year
* Fixed package description
2023-05-03 10:21:05 -07:00
George Hotz
3b933b0a2f
rocm setup script
2023-05-03 16:01:17 +00:00
Rylan Justice
9628a3f190
Updated LICENSE year ( #760 )
2023-05-01 15:35:23 -07:00
Joqsan
0b9d4126d0
Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) ( #758 )
...
* add stack() and repeat() methods
* make stack a static method
2023-05-01 09:37:46 -07:00
George Hotz
59d0d168cd
FLOAT16 off works
2023-04-19 15:34:56 -07:00
George Hotz
3d15769a8f
50 TFLOPS cuda matmul
2023-04-19 14:38:24 -07:00
George Hotz
03b38864db
fix batchnorm at training ( #753 )
...
* e2e testing
* min failure
* no affine on bn, still fails
* why did i think i could detach that?
* allow more kernels for bn
* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
1aa0648d6a
fix path linter issue
2023-04-18 19:17:41 -07:00
George Hotz
cbe2564b7b
oops, no hip yet
2023-04-18 19:10:36 -07:00
George Hotz
e4db0c820f
hlb_cifar10 init from torch weights
2023-04-18 19:09:13 -07:00
George Hotz
a6b9733256
GB/s can be higher
2023-04-18 17:51:03 -07:00