1
0
Fork 0
Commit Graph

1866 Commits (f4f23dc9a3251724b3c928592899ad3dce4c5fc6)

Author SHA1 Message Date
George Hotz f4f23dc9a3 version bump 2023-05-26 00:51:25 +00:00
George Hotz faf80418b7
pyopencl by default since GPU is default (#802) 2023-05-25 17:48:18 -07:00
wozeparrot fca5028d78
feat: ability to exclude cl devices from being used (#801) 2023-05-25 17:31:29 -07:00
Benedikt 3c465470f2
pip installation one liner (#793) 2023-05-25 16:43:42 -07:00
George Hotz a968c4c3a4
Cleanup mlperf (#797)
* improve factorization

* cleanups
2023-05-25 11:36:43 -07:00
Diogo c19ef0fcce
Add sin/cos/tan (#794)
* added sin/cos/tan

* fix lint

* added onnx ops support
2023-05-25 09:04:56 -07:00
wozeparrot 01ae45a43c
Add mlperf RNN-T model (#782)
* feat: initial rnn-t

* feat: working with BS>1

* feat: add lstm test

* feat: test passing hidden

* clean: cleanup

* feat: specify start

* feat: way faster lstm & model

* fix: default batch size

* feat: optimization

* fix: fix metrics

* fix: fix feature splicing

* feat: cleaner stacktime

* clean: remove unused import

* clean: remove extra prints

* fix: fix tests and happy llvm

* feat: have the librispeech dataset in its own dir

* clean: unused variable

* feat: no longer need numpy for the embedding + slightly more memory efficient lstm

* fix: forgot to remove something that broke tests

* feat: use relative paths

* feat: even faster

* feat: remove pointless transposes in StackTime

* fix: correct forward

* feat: switch to soundfile for loading and fix some leaks

* feat: add comment about initial dataset setup

* feat: jit more things

* feat: default batch size back to 1

larger than 1 is broken again :(
and even in the reference implementation it gives worse results
2023-05-25 00:41:21 -07:00
Sasha Krassovsky b258af117a
Fix PytestCollectionWarning when running tests (#791) 2023-05-24 23:17:57 -07:00
George Hotz 0400315078 Revert "ops rdna"
This reverts commit 81a11d891d.
2023-05-21 13:02:18 -07:00
George Hotz 325a3bf2cf Revert "writing 2"
This reverts commit dddd6c42f0.
2023-05-21 13:02:17 -07:00
George Hotz dddd6c42f0 writing 2 2023-05-21 12:52:36 -07:00
George Hotz 81a11d891d ops rdna 2023-05-21 11:45:38 -07:00
George Hotz ed038ba129
Contract float4 ALU operations (#780)
* wrong expand

* tests passing

* pass lint
2023-05-16 19:03:49 -07:00
George Hotz 90fff82c8a
Rdna (#776)
* assembler maybe

* custom asm

* rdna3 on quiet

* trigger crashes

* fixed notes

* non-fatal rdna2 crash

* Crash4

* improve rdna sniffer

* comments

* improve sniffer

* asm

* 131 TFLOPS RDNA3

* opt simple matmul

* todos
2023-05-16 05:33:57 -07:00
George Hotz 89b8b39d9c fix mypy 2023-05-13 21:25:36 -07:00
George Hotz e0b2035023 fast imagenet eval, gets 76.14% across the set 2023-05-13 21:18:31 -07:00
Jacky Lee c552f6f92b
Inference test: add tests for ResNet50 (#773)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix
2023-05-13 21:18:15 -07:00
Rabia Eda Yılmaz e5b4b36cba
add std to tensor.py (#767)
* add std

* delete comment

* edit: one liner std, add: test

* adjust

* fix: shape mismatch

* set unbiased to False

* added unbiased option

* fix unbiased option in test and clean code

* better

* generalize axis

* holly coffee molly

* generalize axes without unbiased opt.

* hopefully done

* complete unbiased true for axes

* Update test_ops.py

* fixed

* std completed without bessels correction

* fix comment

* ups
2023-05-13 12:20:44 -07:00
George Hotz b705510d5c getting 77% on imagenet eval 2023-05-13 07:46:27 -07:00
George Hotz 810f03dafa
conv3d + unet3d (#772)
* conv3d, needs test

* test passes, padding wrong on unet

* unet3d

* no conv3d on images
2023-05-12 13:54:07 -07:00
George Hotz 46d419060b start on mlperf models 2023-05-10 16:30:49 -07:00
Jacky Lee d13629cb26
ResNet: match implementation with Nvidia and PyTorch (#770)
* Match ResNet implementation with pytorch and nvidia

* Reduce number of Epochs
2023-05-10 09:01:22 -07:00
Jacky Lee b80cf9220c
Statistics test: check if distributions match torch (#769)
* Check if tensor values match torch

* Clean up randomness tests and remove dependency

* Remove kaiming uniform test
2023-05-07 21:43:23 -07:00
George Hotz cb7c22beeb fix mypy 2023-05-06 19:18:54 +00:00
George Hotz 5190037cbc rocm: disassembler for shader 2023-05-06 19:07:52 +00:00
George Hotz 7fbf96b992 jit: TODO, use abstractions 2023-05-05 22:51:30 -07:00
George Hotz 0cd3feb452 jit oops. should add that to commit tests 2023-05-05 22:01:13 -07:00
George Hotz 5b2ae262db assertions for jit 2023-05-05 21:56:32 -07:00
George Hotz 42256c0d9d rocm sniffer dumps code 2023-05-05 18:36:53 +00:00
George Hotz 81aa3e546b
exclude GPU on tiny (#766) 2023-05-05 10:07:23 -07:00
George Hotz f2a964f447
nocopy (#764) 2023-05-05 09:32:06 -07:00
George Hotz 466ffeb04f fast cifar on AMD 2023-05-05 02:10:50 +00:00
George Hotz 3a2011ab2d rocm sniffer 2023-05-04 22:22:39 +00:00
George Hotz a55c4f5000 better rocm build scripts 2023-05-04 09:14:05 +00:00
George Hotz 987b1aaf96 rocm build scripts 2023-05-04 08:45:23 +00:00
George Hotz f28df9900f
multidevice works (#763)
* basic multigpu working

* better multigpu test

* upper

* touchups

* cl sync
2023-05-04 01:04:58 -07:00
George Hotz 4f6d674ec0 use CPU tests in pre-commit 2023-05-03 19:46:16 +00:00
George Hotz ed33a89d52 no werror in archprobe 2023-05-03 19:34:17 +00:00
George Hotz 7ecf4dff68
multi cl_queue (#762)
* multi cl_queue

* only platforms 1

* gpus first, then cpus

* put device on underlying buffer

* cl_queue array
2023-05-03 12:15:28 -07:00
Rylan Justice 7757f5fed2
Fixed package description (#761)
* Updated LICENSE year

* Fixed package description
2023-05-03 10:21:05 -07:00
George Hotz 3b933b0a2f rocm setup script 2023-05-03 16:01:17 +00:00
Rylan Justice 9628a3f190
Updated LICENSE year (#760) 2023-05-01 15:35:23 -07:00
Joqsan 0b9d4126d0
Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758)
* add stack() and repeat() methods

* make stack a static method
2023-05-01 09:37:46 -07:00
George Hotz 59d0d168cd FLOAT16 off works 2023-04-19 15:34:56 -07:00
George Hotz 3d15769a8f 50 TFLOPS cuda matmul 2023-04-19 14:38:24 -07:00
George Hotz 03b38864db
fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz 1aa0648d6a fix path linter issue 2023-04-18 19:17:41 -07:00
George Hotz cbe2564b7b oops, no hip yet 2023-04-18 19:10:36 -07:00
George Hotz e4db0c820f hlb_cifar10 init from torch weights 2023-04-18 19:09:13 -07:00
George Hotz a6b9733256 GB/s can be higher 2023-04-18 17:51:03 -07:00