1
0
Fork 0
Commit Graph

479 Commits (deepcrayon)

Author SHA1 Message Date
Jacob Pradels b112edd2c3
Add pylint trailing whitespace rule (#1314) 2023-07-21 13:37:55 -04:00
George Hotz bfbb8d3d0f
fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
madt2709 d2c1e8409a
Update arange to be (start, stop, step) (#1308) 2023-07-21 00:27:23 -04:00
George Hotz f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz b58dd015e3 stable diffusion: remove import numpy as np 2023-07-20 19:35:44 -07:00
George Hotz 35bc46289c stable diffusion: use new tinygrad primitives 2023-07-20 19:25:49 -07:00
Stan 0a3d4f8103
Implementation of VITS TTS model (#1188)
* [WIP]: implementation of VITS TTS model

* Implemented VITS model, moved all code to examples/vits.py

* Added support for vctk model, auto download, and cleanups

* Invoke tensor.realize() before measuring inference time

* Added support for mmts-tts model, extracted TextMapper class, cleanups

* Removed IPY dep, added argument parser, cleanups

* Tiny fixes to wav writing

* Simplified the code in a few places, set diff log level for some prints

* Some refactoring, added support for uma_trilingual model (anime girls)

* Fixed bug where embeddings are loaded with same backing tensor, oops

* Added emotional embed support, added cjks + voistock models

- voistock is multilingual model with over 2k anime characters
- cjks is multilingual model with 24 speakers

both are kinda bad for english though :c

* Removed `Tensor.Training=False` (not needed and wrong oop)

* Changed default model and speaker to vctk with speaker 6

* Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy

* Removed accidentally pushed test/spline.py

* Some slight refactors

* Replaced masked_fill with tensor.where

* Added y_length estimating, plus installation instructions, plus some cleanups

* Fix overestimation log message.

* Changed default value of `--estimate_max_y_length` to False

This is only useful for larger inputs.

* Removed printing of the phonemes

* Changed default value of `--text_to_synthesize`
2023-07-20 17:37:14 -07:00
George Hotz f7b0320d8b
add cifar training regression test (#1287)
* add cifar training regression test

* clean up print
2023-07-19 14:17:09 -07:00
Francis Lam 3db57d3118
Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275) 2023-07-19 13:22:33 -04:00
Yixiang Gao a8f2c16f8e
add contiguous (#1246) 2023-07-15 08:36:34 -07:00
Diogo a9a1df785f
Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Hey 4f72eb823c
Outdated repository URL (#1218)
* Update outdated repo url

* Update more outdated repo url's
2023-07-11 23:14:19 -07:00
AN Long f75de602df
fix typo in stable diffusion example (#1219) 2023-07-11 15:26:40 -07:00
Stan f40f8cd055
Initialise numpy arrays as float32 in DDPG (#1171)
float64 is not supported by tinygrad
2023-07-07 12:05:31 -07:00
terafo aa60feda48
Fix naming conflict with huggingface datasets (#1161)
* Rename in files

* Move files

* Moved to extra/datasets as suggested

* Changes to files

* Fixed stupid mistake

---------

Co-authored-by: terafo <terafo@protonmail.com>
2023-07-07 10:43:44 -07:00
Stan 9b6e57eccd
helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
Kunwar Raj Singh 8391648822
Over 90% on CIFAR with examples/hlb_cifar10.py (#1073)
* fix eval, lr decay, best eval

* 82.27

* 82.64

* 82.79, reproducable

* add lr sched, 85.26

* 87.42

* 87.94

* 87.42

* tta with flip

* training flip aug

* refactor

* using Tensor for LR is faster

* 89.5

* refactor, flip only train set

* 90.01

* 90.64

* eval jit

* refactor

* only JIT model

* fix eval JIT

* fix eval JIT

* 90.82

* STEPS=900 reaches 90.22

* TTA envvar

* TTA default 0

* fully jit training

* refactor optim

* fix sched

* add label smoothing

* param changes

* patial gelu

* OneCycle with pause

* gelu maybe works

* 90.12

* remove pause lr

* maybe fix lr schedulers

* scheduler test passing

* comments

* try mixup

* shuffle!

* add back the missing last eval

* fix shuffle bugs

* add mixup prob

* fix mixup prob

* 90.19

* correct mixup

* correct mixup

* correct mixup

* 90.24

* 90.33

* refactor, add type hints

* add gradient clipping

* maybe fix test

* full JIT

* back to relu for now

* pass mixup prob as param

* add typehints

* maybe CI works

* try erf gelu

* CI, types

* remove useless import/

* refactor optim

* refactor optim

* try leakyrelu

* try celu

* gelu

* 90.67

* remove grad clip

* remove grad clip tests

* revert params

* add test for OneCycleLR

* 90.62

* fix eval timing

* fix eval timing again

* so where i calculate mixup_prob matters

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00
nimlgen d363d25ee2
fix imports for examples/transformer.py (#1136) 2023-07-05 08:15:13 -07:00
George Hotz 87d21ea979 examples: simple conv bn 2023-07-04 13:50:26 -07:00
Reza Rezvan 8ae9a054ae
Refactor nn.optim (#1091)
* Refactor: nn.optim.py

* Refactor: nn.optim.py; Fix all tests

* Refactor: Replace all optim.get_parameters()

* Refactor: Revert list comp.

* Refactor: Replace optim.get_state_dict

* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo 10f1aeb144
fixed broken link (#1097) 2023-07-02 15:06:59 -07:00
nmarwell26 12ce68c1ee
Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086) 2023-07-01 12:04:28 -07:00
George Hotz 6ec0a24706 imagenet eval in 1 min 28 sec 2023-06-28 04:23:26 +00:00
Rayan Hatout 65cbaa3429
no need to slice A and B twice in LLaMa complex multiplication (#1054) 2023-06-26 14:42:58 -07:00
Kunwar Raj Singh 5d3310ce56
MaskRCNN Inference (#884)
* MaskRCNN weights loading

* backbone maybe works

* backbone works, but resnet body atol 1e-3

* RPN Call, but veryy wrong output

* fixed topk

* RPN maybe works, not sure about nms

* Fix cursed modules

* add back editorconfig

* Full call, wrong output

* Full call works

* fix mask

* use NMS from retinanet

* Removing extra funcs

* refactor

* readable

* Add example to run model

* remove filter

* Fix split, batched inference is worse

* Fix image sizes

* Matching reference

* merge master

* add filter on top detections

* cuda backend fixed

* add model eval and spec

* convert images to rgb

* fix eval

* simplify examples code

* remove extra code

* meshgrid using tinygrad

* removing numpy

* roi align, floor, ceil

* remove numpy from level_mapper

* remove numpy from pooler

* Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference"

This reverts commit 4b95a3cb49, reversing
changes made to 98f2b1fa2e.

* roi align gather

* fix master merge

* revert to old floor, ceil as ints present in domain

* use log2 op

* fix indexes

* weird bug with ints and gpu

* weird bug with ints and gpu

* refactors, add env var for gather

* floor with contiguous, where

* refactor topk, sort

* remove staticmethod

* refactor stride

* remove log2 mlop

* realize -> contiguous

* refactor forward

* remove num_classes, stride_in_1x1 from state

* refactor forward

* refactoring

* flake8

* removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk

* keep using tinygrad for smaller gathers

* fix empty tensors

* comms

* move from tensor.py

* resnet test passing

* add coco dataset back

* fix spaces

* add test for log2

* no need to create Tensors

* no need to create Tensors

---------

Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-06-25 15:37:51 -07:00
Yair Lifshitz 7f73d6a4da
Fix input path in examples/compile_efficientnet.py, examples/efficientnet.py. (#1034) 2023-06-23 16:34:33 -07:00
Yann Huynh ccb51ff5b0
"Fixed argument passing in example yolov8" (#1004)
"Fixed argument passing in example yolov8"
2023-06-18 14:29:39 -07:00
sehaj 775287ed91
Add yolov8 implementation (#806)
* added SPPF module from yolov8

* added conv_block, bottleneck modules

* cleaned modules

* c2f example

* spf changes

* C2f

* fixed and tested bottleneck

* improved detect class

* tested spf and conv

* checked c2f

* DFL structure

* fixed dfl

* added dist2bbox function

* added dist2bbox function

* added and tested make_anchors function for the head

* keeping functions above

* creating the detection head

* fixing head

* untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou

* head works

* structure fixx

* added darknet (backbone)

* yolov8 neck, and intialize bias function while detection

* fixed spacing

* yolov8 class, init bias, and fixed c2f

* forward pass almost working

* fixed net structure

* init bias not needed, forward pass working

* load weights boilerplate

* load weights done?

* all variants loading!

* post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested)

* fix scale_boxes

* box_iou fixed and tested

* created the pre nms function

* fix nms

* fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked

* added letterbox and pre_tranform for pre_process function

* fixed letterbox, pre_transform and added preprocess function

* custom NMS done, integrated prepare_boxes and nms, improved box_iou

* added postprocess function till parsing

* added draw_bounding_boxes_and_save function

* testing full flow

* using fetch for class names

* fixed make_anchors + all tinygrad now

* added command line arguments, weight downloading

* single image for now only

* made draw boxes more efficient

* made NMS functions efficient

* made compute_transform better

* v8 working now, inference is done

* prints objects detected in console now

* fixed image loading (pre processing)

* batch post processing

* created initial tests

* fixes bounding box thickness AND added get_detected_classes_with_frequency function

* cleaning for testing

* two tests

* added url option for image, removed need for specifiying arguments

* tests complete, but lots on things are printed on screen by ultralytics

* remove parse arguments

* fixed weight location

* fixed colours of classes, and black font when high brightness

* minor changes

* TODOs for later

* removed use of torch, using .npz weights

* fixed tests

* one path for fetch

* preprocess now in tinygrad, plus test fix for that

* updated tests

* fix tests

* no class labels needed

* Add files via upload

* Update showcase.md

* Update showcase.md

* added safe tensors as weights, and tests fix for that

* safe tensors test

* using safe_load

* using tinygrad functions now to load weights

* update tests

---------

Co-authored-by: r3sist-uniq <amanmatreja@gmail.com>
Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>
2023-06-16 18:55:19 -07:00
Diogo 2d4370b487
Adds tril & triu support (#936)
* triu & tril support

* lint and kernel count error

* switched shape indicies

* larger shape tests

* reverted numpy removal until #942 is resolved
2023-06-09 22:13:20 -07:00
cloud11665 e8a23d4331
there is a better way to do that! (#950) 2023-06-06 15:23:30 -07:00
Diogo 3bb38c3518
limit split to 1 due to windows path containing : (#944) 2023-06-06 10:27:54 -07:00
George Hotz b78addf2f8
Whisper (#919)
* no whispering yet

* whispering

* live whisper

* small support
2023-06-03 18:55:14 -07:00
George Hotz ed1963b899
Fast DiskTensor to other Tensor (#916)
* make disktensors fast

* loading

* loader for sd and llama
2023-06-03 12:25:41 -07:00
George Hotz d58586bb17
safetensors! (#903)
* safetensors test

* safe_save

* load back with real safetensors

* bugfix in device name. add simple torch_load

* it works for llama, but it's slower...

* mmap

* no intermediate

* load mmaped

* readinto speed

* not ready yet

* revert that
2023-06-02 13:41:09 -07:00
George Hotz 8a928ed2f3
nn init matches torch (#901) 2023-06-01 21:24:11 -07:00
Peter Ross 27845fd3a3
train_efficientnet: only import datasets.imagenet when IMAGENET is set (#899)
make it work out of the box for new users.

the default configuration of train_efficientnet is to use the smaller cifar
dataset. import datasets.imagenet tries to open imagenet_class_index.json
and will fail, unless user has already downloaded it.
2023-06-01 19:19:52 -07:00
George Hotz 1b42b4e1b8 fix examples/hlb_cifar10.py 2023-06-01 19:03:17 -07:00
wozeparrot 0fc4cf72a2
feat: add train scaffolding (#859) 2023-05-30 07:10:40 -07:00
Jacky Lee 5d212864b5
Add MLPerf UNet3D model (#775)
* Add ResNet inference test and cannon

* Test with ResNet50

* test_car works with resnet fix

* Add KiTS19 dataset

* KiTS19: Implement iterate

* No batch load for this dataset

* Save results on iterate

* Implement dice score

* Add data prep and eval functions

* Resolve shape issue

* Conversion works but wrong values

* Segfaults when load_from_pretrained is called

* Fix segfault and assign properly

* Final result generated, though very slow

* Store and load final result to save time

* Fix typo in finalize

* Score computes

* More bug fixes, dice score is very low

* Working broken code

* Assign output values to result

* Getting a much higher score now

* Fix dataset preprocessing

* Mean DICE score of 88.5

* Ugh, typo

* Attempt to reimplement model

* Rename layers

* Tiny model works, kinda

* Accuracy? gone

* Implement InstanceNorm and match torch

* Test instance norm 2d and 3d

* Combined input block with downsample block

* Tiny model works, support strided convtranspose

* Commands to download dataset

* Clean up a bit

* unet3d_v2 -> unet3d

* Remove duplicated code

* Oops, put tests back
2023-05-28 20:38:19 -07:00
Sohaib 65d09031f2
add retinanet with resnet backbone (#813)
* add retinanet with resnet backbone

* adds resnext to support loading retinanet pretrained on openimages
* object detection post processing with numpy
* data is downloaded and converted to coco format with fiftyone
* data loading and mAP evaluation with pycocotools

* remove fiftyone dep

* * eval freq

* fix model timing

* del jit for last batch

* faster accumulate
2023-05-28 20:20:16 -07:00
wozeparrot 67de3aa1de
Add mlperf bert model (#803)
* feat: add mlperf bert model

* feat: switch to nn.Embedding

* clean+fix: fix formatting

* feat: add simple downloader

* feat: metrics

* feat: don't actually need exact match

* feat: doing a run

* feat: set eps on the layernorms

* clean+fix: cleaner impl + hopefully fixed

* feat: move dataset initialization into iterate

* feat: move tokenizer out of iterate

* clean+fix: cleaner + working

* clean: cleanup

* fix: fix metrics

* feat: need to use original bert gelu + download vocab

* feat: make directory if it doesn't exist yet

* feat: jit go brrr
2023-05-27 14:53:32 -07:00
wozeparrot 0dc333cfab
Promote Embedding to `nn` (#798)
* feat: promote Embedding to nn

* fix: fix failing test

* feat: add test with jit

* feat: rewrite embedding to no longer need stacked for loops

* clean+fix: don't know how that happened
2023-05-25 18:39:45 -07:00
George Hotz a968c4c3a4
Cleanup mlperf (#797)
* improve factorization

* cleanups
2023-05-25 11:36:43 -07:00
wozeparrot 01ae45a43c
Add mlperf RNN-T model (#782)
* feat: initial rnn-t

* feat: working with BS>1

* feat: add lstm test

* feat: test passing hidden

* clean: cleanup

* feat: specify start

* feat: way faster lstm & model

* fix: default batch size

* feat: optimization

* fix: fix metrics

* fix: fix feature splicing

* feat: cleaner stacktime

* clean: remove unused import

* clean: remove extra prints

* fix: fix tests and happy llvm

* feat: have the librispeech dataset in its own dir

* clean: unused variable

* feat: no longer need numpy for the embedding + slightly more memory efficient lstm

* fix: forgot to remove something that broke tests

* feat: use relative paths

* feat: even faster

* feat: remove pointless transposes in StackTime

* fix: correct forward

* feat: switch to soundfile for loading and fix some leaks

* feat: add comment about initial dataset setup

* feat: jit more things

* feat: default batch size back to 1

larger than 1 is broken again :(
and even in the reference implementation it gives worse results
2023-05-25 00:41:21 -07:00
George Hotz e0b2035023 fast imagenet eval, gets 76.14% across the set 2023-05-13 21:18:31 -07:00
George Hotz b705510d5c getting 77% on imagenet eval 2023-05-13 07:46:27 -07:00
George Hotz 810f03dafa
conv3d + unet3d (#772)
* conv3d, needs test

* test passes, padding wrong on unet

* unet3d

* no conv3d on images
2023-05-12 13:54:07 -07:00
George Hotz 46d419060b start on mlperf models 2023-05-10 16:30:49 -07:00
Jacky Lee d13629cb26
ResNet: match implementation with Nvidia and PyTorch (#770)
* Match ResNet implementation with pytorch and nvidia

* Reduce number of Epochs
2023-05-10 09:01:22 -07:00
George Hotz e4db0c820f hlb_cifar10 init from torch weights 2023-04-18 19:09:13 -07:00
George Hotz 732884653c osx in hlb_cifar10_torch 2023-04-14 13:12:08 -07:00
George Hotz 584ee6f616 don't graph consts 2023-04-14 03:32:20 -07:00
George Hotz 9a39ebefde hlb_cifar10_torch gets 80% 2023-04-14 02:47:03 -07:00
Jacky Lee 06ed958abd
Fix train_resnet example (#744)
* Fix ResNet example

* Scientific notation
2023-04-12 13:48:39 +05:30
Jacky Lee 7a45b989a1
Device: make GPU default and METAL/CUDA if possible (#732)
* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* run github workflow

* Fix logic to select default

* pass if an error occurs

* use separate function for try except
2023-04-04 09:41:52 +05:30
Jacky Lee 156640e90d
Permute examples (#731)
* examples: use permute instead of transpose

* Use transpose but change args
2023-03-29 05:07:06 +04:00
George Hotz b12b60af20
fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
Fernando Vidal 73bd0b217b
add int64 as supported dtype from numpy (#699)
* add int64 as supported dtype from numpy

Without this, examples/transformer.py didn't run. With this change it runs successfully.

* Update helpers.py

* Update transformer.py

* Update training.py
2023-03-18 17:15:04 -07:00
George Hotz f5467cfedc
Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Kirill 26a3888ab8
Fix llama 13B RAM usage (#710) 2023-03-18 13:50:09 -07:00
Kirill 0fe5014b1f
Use pathlib (#711)
* Use pathlib in llama

* Use pathlib in stablediffusion
2023-03-18 13:49:21 -07:00
Kirill 0532025b04
Fix llama 13B weights loading (#700)
* Fix llama 13B weights loading

* refactor more

* add test

* test storage offset

* fix spacing

* fix strides

* llama 13B working?

* yolo?

* better test for seeks
2023-03-15 08:59:52 -07:00
Ayushman Kumar e28bd11ff1
Cast Tensor data to float32 (#703)
* Cast Tensor data to float32

* astype('float32') --> Tensor.randn()
2023-03-14 23:09:41 -07:00
Jacky Lee 5e820818e9
Cast image to float32 (#702) 2023-03-14 08:13:19 -07:00
George Hotz fe0e8a306f jittable llama 2023-03-12 14:15:04 -07:00
George Hotz 15e0b56e39
compile works (#688)
* compile works

* runtimes

* line count

* fix custom, to tg dtype

* meh, that's fine with lazy import
2023-03-12 11:01:25 -07:00
Kirill af7745073f
Add comments to SD (#686)
* Add explanation for empty lambdas

* Fix my_unpickle if pytorch_lightning is installed

* oops
2023-03-12 10:56:49 -07:00
George Hotz 046b3952c3 get_state_dict 2023-03-11 23:46:53 -08:00
George Hotz 803b0aef28 track memory for numpy/torch 2023-03-11 20:39:10 -08:00
George Hotz 61071f881a fix bug, and add unit test to catch failure 2023-03-11 16:57:25 -08:00
George Hotz 3ec457248c failing llama test 2023-03-11 16:28:10 -08:00
George Hotz 8aa63847c7 llama: up max tokens to 1000 2023-03-11 13:39:33 -08:00
George Hotz 5ea44cefcc llama: add lexie personality 2023-03-11 10:23:33 -08:00
George Hotz c908f911a7 llama defaults to metal on osx 2023-03-11 09:30:13 -08:00
George Hotz 5e1380df6a profiling llama + cache is_contiguous 2023-03-11 08:23:21 -08:00
George Hotz f3ac52aee8
Mypyc (#680)
* building shapetracker

* default ENABLE_METHOD_CACHE

* symbolic compiles

* improve types

* tensor compiles

* oops, that's a bug

* best of both worlds

* find legit typing bugs

* pad2d can take list or tuple

* sub 200ms when compiled
2023-03-11 07:33:30 -08:00
George Hotz b1206bcb18
third try at torch loading (#677)
* third try at torch loading

* numpy fixed

* fix enet compile

* load_single_weight supports empty weights

* oops, CPU wasn't the default

* so many bugs
2023-03-10 19:11:29 -08:00
George Hotz 8bf75a7fdd fix stable diffusion and CI 2023-03-10 17:48:12 -08:00
George Hotz 4780f9a6df llama runs (slowly) in master 2023-03-10 17:36:51 -08:00
jspieler da7fb4b227
Fixed DDPG example (#667) 2023-03-09 11:49:52 -08:00
George Hotz c22afc52db move the custom function example to a test 2023-03-08 10:05:04 -08:00
George Hotz 7d3b9d0e95 oops, things relied on that API. the global cache needs access to the ASTRunner class 2023-03-08 08:39:31 -08:00
George Hotz 4f957423c3 jitting custom ops + OPTLOCAL assignment bugfix 2023-03-08 08:30:37 -08:00
George Hotz 7285de41a1 tinygrad supports CUSTOM functions 2023-03-08 07:50:33 -08:00
Pankaj Doharey 9d97d97b26
Opens image in default viewer after saving. (#612) 2023-03-03 17:28:49 -08:00
George Hotz 2e26286294
speed like you wouldn't believe (#626)
* speed like you wouldn't believe

* fix tests
2023-03-02 07:49:19 -08:00
George Hotz bfcec234a2
Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz c4856aa193 fix yolo webcam 2023-02-26 17:24:05 -08:00
Jacky Lee 0f58c4c648
Cleanup yolo and remove stateless classes (#604)
* Add AvgPool2d as a layer

* Clean up a bit

* Remove stateless layers in yolo_nn

* More cleanup

* Save label for test

* Add test for YOLO

* Test without cv2

* Don't fail if cv2 not installed

* Better import

* Fix image read

* Use opencv :)

* Don't download the file

* Fix errors

* Use same version

* Set higher confidence

* Why is the confidence so low?

* Start over

* Remove stateless layers

* Remove extra lines

* Revert changes

* Save a few more lines
2023-02-26 16:55:21 -08:00
voidz 94bec40110
moved extras/jit.py -> tinygrad/jit.py (#599)
* moved extras/jit.py to tinygrad/jit.py

* fixed indent

* removed tinygrad.helpers.DEBUG from jit.py
2023-02-25 08:32:33 -08:00
Benedikt Mandelkow 7348e9a6c6
add restrict qualifier to inputs in c backend (#593)
* add restrict qualifier for clang backend convolution inputs/ outputs
see https://godbolt.org/z/Tb9jMxWfx for generated assembly

* enable more checks

* inline fmax to motivate the compiler to inline some more

* fix if else binding power
2023-02-25 08:32:21 -08:00
George Hotz 2e56a4793e rename log_softmax, support dim, fix onnx Softmax 2023-02-24 10:11:24 -08:00
George Hotz 94ccab941e compile_tensorflow: no cast required 2023-02-22 21:14:21 -08:00
George Hotz 135d0ddb78 compile_tensorflow: read weights from disk 2023-02-22 21:12:35 -08:00
George Hotz 0615dcffe7 compile_tensorflow: save the weights 2023-02-22 21:05:45 -08:00
George Hotz c537fd0614 compile_tensorflow: add initialize and tests 2023-02-22 20:50:53 -08:00
George Hotz dc914cde50 compile_tensorflow 2023-02-22 20:08:58 -08:00
George Hotz 76b4d0577d yolov8 works up to the MaxPool 2023-02-22 19:32:13 -08:00
Mischa Untaga 14bb2c40a2
Fix yolov3 example (#577) 2023-02-21 09:24:00 -08:00
George Hotz d9fa47ecc9 use the TinyJit in the efficientnet runner, 200ms -> 20ms 2023-02-20 19:58:16 -08:00
George Hotz 714bf4b108
clang backend (#572)
* start clang backend

* mostly working

* no group for reduce w clang

* it compiles

* compiles

* a11y

* minor fixups

* formatting

* add a test

* rename test
2023-02-20 18:18:18 -08:00
Jacky Lee cb679cd051
Fix weight initialization (#566)
* Fix weight initialization

* Use scaled_uniform in serious_mnist
2023-02-19 11:25:29 -08:00
Kirill 7944cfdadc
Remove Tensor.data (#565) 2023-02-18 16:36:12 -08:00
Jacky Lee 7e8b0305f3
Fix mnist gan example (#563) 2023-02-18 13:45:37 -08:00
Jacky Lee 9fd41632c6
Import get_parameters from tinygrad.nn (#559)
* get_parameter is in optim

* Update all imports for get_parameters

* Clean up

* use optim.get_paramters
2023-02-17 15:22:26 -08:00
Jacky Lee e172f0087a
BatchNorm2D -> BatchNorm2d (#558)
* BatchNorm2D -> BatchNorm2d

* Fix typo
2023-02-16 12:31:49 -08:00
Jacky Lee c35fcc6964
Replace phrase for prompt (#555) 2023-02-12 09:04:44 -08:00
George Hotz 191c76cfd7 hlb_cifar10 torch version 2023-02-11 18:04:40 -08:00
George Hotz 9057d98d36 no lr decay in cifar. test this in torch tomorrow 2023-02-11 17:42:54 -08:00
George Hotz dd7accb9cc decay LR, little bugfix 2023-02-11 17:34:15 -08:00
George Hotz ba3bf5bdf7 cifar stops learning 2023-02-11 17:21:42 -08:00
George Hotz 7d33f2d659 CL.CACHE is over, GlobalCounters.cache is it 2023-02-11 12:00:14 -08:00
George Hotz 9152bb5b4a momentum support in SGD 2023-02-11 10:22:37 -08:00
George Hotz 031edd01e6 switch openpilot compile to TinyJit 2023-02-11 09:51:44 -08:00
jspieler 8f912c3966
added deep deterministic policy gradient example (#531) 2023-02-11 10:10:46 -06:00
George Hotz 608fd730d3 put the JIT in extra 2023-02-11 00:35:18 -06:00
George Hotz ed8ae7522a tinyjit 2023-02-11 00:22:36 -06:00
George Hotz 4c90a15689 make the fake data actually learnable 2023-02-10 23:35:21 -06:00
George Hotz 07629d7476 fakedata and move to new cache 2023-02-10 23:32:31 -06:00
George Hotz 63fa7daf30 wrong place for CL 2023-02-10 23:22:24 -06:00
George Hotz fed95119dc CL.mem_used -> GlobalCounters.mem_used 2023-02-10 23:13:29 -06:00
Kirill 27154db99a
Downloads weights in examples/stable_diffusion.py (#537)
* Downloads weights in examples/stable_diffusion.py

* use download_file_if_not_exists in fetch

* make consistent with previous NOCACHE behavior
2023-02-10 14:37:04 -06:00
Jacky Lee f08187526f
Fix examples (#540)
* Fix examples

* Remove training in parameters

* Simplify a bit

* Remove extra import

* Fix linter errors

* factor out Device

* NumPy-like semantics for Tensor.__getitem__ (#506)

* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None

* Fixed pad2d

* mypy doesn't know about mlops methods

* normal python behavior for out-of-bounds slicing

* type: ignore

* inlined idxfix

* added comment for __getitem__

* Better comments, better tests, and fixed bug in np.newaxis

* update cpu and torch to hold buffers (#542)

* update cpu and torch to hold buffers

* save lines, and probably faster

* Mypy fun (#541)

* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup

* dyn add of math ops

* refactor ops_cpu and ops_torch to not share code

* nn/optim.py compiles now

* Reorder imports

* call mkdir only if directory doesn't exist

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: Mitchell Goff <mitchellgoffpc@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-02-10 12:09:37 -06:00
George Hotz a5a55ac19e GlobalCounters cache + assign in optim 2023-02-08 17:10:55 -06:00
George Hotz 3d63934995
refactor to keep cl in the runtime (#545)
* refactor to keep cl in the runtime

* fix thneed, rename cl to _cl

* bugfix + _cuda

* fix tests

* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz 2844482a60
Mypy fun (#541)
* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz f7291f6ca3
fixes big KOPT, breaks opencl (#505)
* fixes big KOPT, breaks opencl

* fix optimizer

* KernelCache

* oops, broke batchnorm

* hack to fix it

* fix llvm, less hacky gpu

* disable the cache

* cache just breaks things
2023-02-05 10:46:17 -08:00
James Roberts db0a9b0a2d
Refactor CL.time_sum into GlobalCounters (#519) 2023-02-01 20:13:56 -08:00
George Hotz 5e37f084db stable diffusion: clean up constant folding 2023-02-01 12:53:16 -08:00
Jacky Lee 486f023e81
Rename Normalize and move to nn (#513)
* Rename Normalize and move to nn

* Match PyTorch for dim>1
2023-02-01 11:55:03 -08:00
Jacky Lee 799b3f185a
Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz 21f2af08d5 getenv + graphing 2023-01-30 19:15:03 -08:00
George Hotz 60ccddb58b reenable SWAP 2023-01-30 17:32:02 -08:00
George Hotz aea55eb196 found failing upcast 2023-01-30 16:12:56 -08:00
George Hotz 7ee0d99c70 CLCACHE 2023-01-30 14:02:06 -08:00
George Hotz cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
AllentDan 7b6b1f32b1
[Fix] fix typo: test_mnist -> datasets (#492)
* test_mnist -> datasets

* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz 2db272c7f7
Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
George Hotz 66da3bc3c0 reset the benchmark timer 2023-01-25 09:20:34 -08:00
George Hotz 487685919b
Revert "Rename Normalize and move to nn (#415)" (#474)
This reverts commit d768acb6a9.
2023-01-25 07:50:04 -08:00
Jacky Lee d768acb6a9
Rename Normalize and move to nn (#415)
* Rename Normalize and move to nn

* Fix comparison to None error

* Add test for GroupNorm

* Rename test case

* Flip parameters to match PyTorch

* Increase error tolerance

* Fix elementwise_affine on channels

* Match arguments with PyTorch

* Initialize weight and bias only when affine is true

* Is this it?

* A bit cleaner

* Handle case where weight or bias is None
2023-01-25 07:47:59 -08:00
George Hotz 6d7658db12 delete opencl <celebration> 2023-01-24 14:18:35 -08:00
nogira 2e744ef2f2
confirmed (#449)
w/ a bunch of print statements in the official model here: ce05de2819/ldm/modules/diffusionmodules/openaimodel.py (L413)
2023-01-07 08:41:06 -08:00
Drew Hintz 165fb4d631
remove redundant list comprehension from inside all. (#397)
remove explicit inherit from object.
2022-10-13 09:58:35 -07:00
George Hotz 178ba50c03 some args for stable diffusion 2022-09-29 01:52:04 -04:00
George Hotz a0d169eb59 fix efficientnet 2022-09-28 14:23:01 -07:00
George Hotz 60df954377
Fix weight init: this work? (#391)
* this work?

* glorot uniform

* requies_grad broke

* propagate the None correctly

* so this weight init works

* ahh, i think it's this

* can't beat this

* glorot is best for ae

* remove comments
2022-09-25 16:46:33 -04:00
Jacky Lee 2c01a66265
Reshape dataset from fetch_mnist (#390) 2022-09-24 21:16:29 -04:00
George Hotz 894a7cee79 forgot a few 2022-09-12 09:21:46 -07:00
George Hotz 801ecd4a07 cleanup clip tokenizer 2022-09-12 09:20:12 -07:00
Fernand Pajot ff0da4c802
Added standalone CLIP tokenizer (#382)
* Added standalone CLIP tokenizer.

* Fixed empty phrase.

* Truncating long prompts.

* Keeping two slots for the start and end token.

* Fixed empty phrase.

* Using tokenizer for empty phrase.

* Typo.
2022-09-12 09:12:55 -07:00
David Redmon a1810c8617
update serious_mnist.py (#380) 2022-09-11 13:37:40 -07:00
George Hotz ecc1a0470d add Linear to tinygrad.nn 2022-09-07 07:40:48 -07:00
George Hotz 896f9f74a9 hmm, need this with broadcast change 2022-09-06 16:54:01 -07:00
George Hotz a18a6a0773 fix sd with TORCH=1 2022-09-06 16:51:16 -07:00
George Hotz 0516359af8 fix stupid OPENCL=1 OOM 2022-09-06 14:29:23 -07:00
George Hotz f215534a64 1100 lines, but sane linter rules 2022-09-06 13:47:45 -07:00
George Hotz 682dc64430 works at work 2022-09-06 08:06:11 -07:00
George Hotz d6f499fd69 improve opencl, why is it OOMing 2022-09-05 20:14:31 -07:00
George Hotz 0ba6179de7 stable diffusion in readme 2022-09-05 18:51:56 -07:00
George Hotz c1d5af8b0c stable diffusion cleanups 2022-09-05 18:34:13 -07:00
George Hotz 3728ef6d02 better alphas 2022-09-05 16:48:26 -07:00
George Hotz 0fda854b3e other prompt example 2022-09-05 16:14:16 -07:00
George Hotz 16cb4290c4 cat horse winning 2022-09-05 16:05:14 -07:00
George Hotz 1043fa067a it renders something 2022-09-05 15:52:14 -07:00
George Hotz 5a685b93ac brown img 2022-09-05 15:20:18 -07:00
George Hotz 98d6264987 all models match 2022-09-05 12:27:54 -07:00
George Hotz b8bd34b5d2 fix last bug in unet probz 2022-09-05 11:32:44 -07:00
George Hotz 3df67aa0af fix transformer bugs 2022-09-05 11:26:32 -07:00
George Hotz 2ed3bb6223 clip model is running 2022-09-05 11:26:32 -07:00
George Hotz 1a54ea2417 runs on torch cpu 2022-09-04 12:06:42 -07:00
George Hotz 9590d92750 stable diffusion compiles (add no_init) 2022-09-04 11:40:50 -07:00
George Hotz 172683c314 work 2022-09-04 11:21:09 -07:00
George Hotz c2a030fe55 one liner that's more clear 2022-09-03 16:08:48 -07:00
George Hotz 4a3ed58edb more readable actually 2022-09-03 16:00:35 -07:00
George Hotz 633f31dc73 easier to read 2022-09-03 15:53:58 -07:00
George Hotz 6578e08919 cleanups for Mid 2022-09-03 15:50:33 -07:00
George Hotz 852de7c66c remove ugly parens 2022-09-03 15:41:37 -07:00
George Hotz 6b190c2fa5 stable diffusion works 2022-09-03 13:55:36 -07:00
George Hotz 947e10dab0 yolo 2022-09-03 12:39:48 -07:00
George Hotz 033a3ecccf found tinygrad bug 2022-09-03 12:32:43 -07:00
George Hotz 114728d363 torch bs 2022-09-03 11:57:23 -07:00
George Hotz 356732515b stable_diffusion: add attn and layernorm 2022-09-03 11:02:27 -07:00
George Hotz 4dadd95e3c fix tests hopefully, more stable diffusion 2022-09-03 10:38:31 -07:00
George Hotz c01a8c5c2d stable diffusion start 2022-09-03 10:08:42 -07:00
George Hotz b132de677d
tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
George Hotz acbeaf0ba9 adam in benchmark_train_efficientnet 2022-07-19 09:33:07 -07:00
George Hotz d985217fa4 skip reduce noops 2022-07-16 07:47:43 -07:00
George Hotz 5e46561f7e no_grad = NOT backward 2022-07-10 20:54:57 -07:00
George Hotz d5d9cffe7c training param for batchnorm 2022-07-04 13:28:03 -07:00
George Hotz 34f43ea10e LAZY and CLCACHE are defaults 2022-07-04 13:09:15 -07:00
George Hotz b7afd83267 track cl mem used 2022-07-04 12:19:00 -07:00
George Hotz d5de8452c6 dashed loadops 2022-07-04 09:50:56 -07:00
George Hotz 7276f8d6bf improve constant folding, detach before moving tensor 2022-07-02 15:29:40 -07:00
George Hotz 0cb99d72e9 NUM=-1 is a small efficientnet for small people 2022-07-02 15:11:51 -07:00
George Hotz 8cf1aed0f4 don't track_running_stats, parameters must require_grad 2022-07-02 14:38:45 -07:00
George Hotz f607f18006 fix backward 2022-06-25 00:00:53 -07:00
George Hotz ec30f0402f improve benchmark_train_efficientnet 2022-06-24 23:46:38 -07:00
George Hotz d748353ce5 err, okay, a bit more off 2022-06-24 22:44:57 -07:00