tinygrab

deepcrayon

tinygrab

Author	SHA1	Message	Date
Jacob Pradels	b112edd2c3	Add pylint trailing whitespace rule (#1314 )	2023-07-21 13:37:55 -04:00
George Hotz	bfbb8d3d0f	fix ones, BS=2 stable diffusion, caching optimizer (#1312 ) * fix ones, BS=2 stable diffusion * caching optimizer * print search time * minor bug fix	2023-07-21 09:55:49 -07:00
madt2709	d2c1e8409a	Update arange to be (start, stop, step) (#1308 )	2023-07-21 00:27:23 -04:00
George Hotz	f45013f0a3	stable diffusion: remove realizes we don't need	2023-07-20 19:53:07 -07:00
George Hotz	b58dd015e3	stable diffusion: remove import numpy as np	2023-07-20 19:35:44 -07:00
George Hotz	35bc46289c	stable diffusion: use new tinygrad primitives	2023-07-20 19:25:49 -07:00
Stan	0a3d4f8103	Implementation of VITS TTS model (#1188 ) * [WIP]: implementation of VITS TTS model * Implemented VITS model, moved all code to examples/vits.py * Added support for vctk model, auto download, and cleanups * Invoke tensor.realize() before measuring inference time * Added support for mmts-tts model, extracted TextMapper class, cleanups * Removed IPY dep, added argument parser, cleanups * Tiny fixes to wav writing * Simplified the code in a few places, set diff log level for some prints * Some refactoring, added support for uma_trilingual model (anime girls) * Fixed bug where embeddings are loaded with same backing tensor, oops * Added emotional embed support, added cjks + voistock models - voistock is multilingual model with over 2k anime characters - cjks is multilingual model with 24 speakers both are kinda bad for english though :c * Removed `Tensor.Training=False` (not needed and wrong oop) * Changed default model and speaker to vctk with speaker 6 * Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy * Removed accidentally pushed test/spline.py * Some slight refactors * Replaced masked_fill with tensor.where * Added y_length estimating, plus installation instructions, plus some cleanups * Fix overestimation log message. * Changed default value of `--estimate_max_y_length` to False This is only useful for larger inputs. * Removed printing of the phonemes * Changed default value of `--text_to_synthesize`	2023-07-20 17:37:14 -07:00
George Hotz	f7b0320d8b	add cifar training regression test (#1287 ) * add cifar training regression test * clean up print	2023-07-19 14:17:09 -07:00
Francis Lam	3db57d3118	Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275 )	2023-07-19 13:22:33 -04:00
Yixiang Gao	a8f2c16f8e	add contiguous (#1246 )	2023-07-15 08:36:34 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Hey	4f72eb823c	Outdated repository URL (#1218 ) * Update outdated repo url * Update more outdated repo url's	2023-07-11 23:14:19 -07:00
AN Long	f75de602df	fix typo in stable diffusion example (#1219 )	2023-07-11 15:26:40 -07:00
Stan	f40f8cd055	Initialise numpy arrays as float32 in DDPG (#1171 ) float64 is not supported by tinygrad	2023-07-07 12:05:31 -07:00
terafo	aa60feda48	Fix naming conflict with huggingface datasets (#1161 ) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com>	2023-07-07 10:43:44 -07:00
Stan	9b6e57eccd	helpers.py: improved test coverage + exception handling (#1165 ) * Fixes + improved test coverage for helpers.py - added exception handling in `proc`, if an exception was thrown, the thread would hang - made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread * Made `_early_exec_process` catch any Exception Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output` * Fixed `from tinygrad.helpers import Timing` import oops, for some reason my IDE cleaned that import from extra/helpers. * Fixed import in llama.py Another one that I skipped by accident, mybad * Extracted a class for tests of early exec * Normalize line endings, windows uses /r/n * Made `cross_process` not a daemon	2023-07-07 10:26:05 -07:00
Kunwar Raj Singh	8391648822	Over 90% on CIFAR with examples/hlb_cifar10.py (#1073 ) * fix eval, lr decay, best eval * 82.27 * 82.64 * 82.79, reproducable * add lr sched, 85.26 * 87.42 * 87.94 * 87.42 * tta with flip * training flip aug * refactor * using Tensor for LR is faster * 89.5 * refactor, flip only train set * 90.01 * 90.64 * eval jit * refactor * only JIT model * fix eval JIT * fix eval JIT * 90.82 * STEPS=900 reaches 90.22 * TTA envvar * TTA default 0 * fully jit training * refactor optim * fix sched * add label smoothing * param changes * patial gelu * OneCycle with pause * gelu maybe works * 90.12 * remove pause lr * maybe fix lr schedulers * scheduler test passing * comments * try mixup * shuffle! * add back the missing last eval * fix shuffle bugs * add mixup prob * fix mixup prob * 90.19 * correct mixup * correct mixup * correct mixup * 90.24 * 90.33 * refactor, add type hints * add gradient clipping * maybe fix test * full JIT * back to relu for now * pass mixup prob as param * add typehints * maybe CI works * try erf gelu * CI, types * remove useless import/ * refactor optim * refactor optim * try leakyrelu * try celu * gelu * 90.67 * remove grad clip * remove grad clip tests * revert params * add test for OneCycleLR * 90.62 * fix eval timing * fix eval timing again * so where i calculate mixup_prob matters --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-06 20:46:22 -07:00
nimlgen	d363d25ee2	fix imports for examples/transformer.py (#1136 )	2023-07-05 08:15:13 -07:00
George Hotz	87d21ea979	examples: simple conv bn	2023-07-04 13:50:26 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
Eli Frigo	10f1aeb144	fixed broken link (#1097 )	2023-07-02 15:06:59 -07:00
nmarwell26	12ce68c1ee	Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. (#1086 )	2023-07-01 12:04:28 -07:00
George Hotz	6ec0a24706	imagenet eval in 1 min 28 sec	2023-06-28 04:23:26 +00:00
Rayan Hatout	65cbaa3429	no need to slice A and B twice in LLaMa complex multiplication (#1054 )	2023-06-26 14:42:58 -07:00
Kunwar Raj Singh	5d3310ce56	MaskRCNN Inference (#884 ) * MaskRCNN weights loading * backbone maybe works * backbone works, but resnet body atol 1e-3 * RPN Call, but veryy wrong output * fixed topk * RPN maybe works, not sure about nms * Fix cursed modules * add back editorconfig * Full call, wrong output * Full call works * fix mask * use NMS from retinanet * Removing extra funcs * refactor * readable * Add example to run model * remove filter * Fix split, batched inference is worse * Fix image sizes * Matching reference * merge master * add filter on top detections * cuda backend fixed * add model eval and spec * convert images to rgb * fix eval * simplify examples code * remove extra code * meshgrid using tinygrad * removing numpy * roi align, floor, ceil * remove numpy from level_mapper * remove numpy from pooler * Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference" This reverts commit `4b95a3cb49`, reversing changes made to `98f2b1fa2e`. * roi align gather * fix master merge * revert to old floor, ceil as ints present in domain * use log2 op * fix indexes * weird bug with ints and gpu * weird bug with ints and gpu * refactors, add env var for gather * floor with contiguous, where * refactor topk, sort * remove staticmethod * refactor stride * remove log2 mlop * realize -> contiguous * refactor forward * remove num_classes, stride_in_1x1 from state * refactor forward * refactoring * flake8 * removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk * keep using tinygrad for smaller gathers * fix empty tensors * comms * move from tensor.py * resnet test passing * add coco dataset back * fix spaces * add test for log2 * no need to create Tensors * no need to create Tensors --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-06-25 15:37:51 -07:00
Yair Lifshitz	7f73d6a4da	Fix input path in examples/compile_efficientnet.py, examples/efficientnet.py. (#1034 )	2023-06-23 16:34:33 -07:00
Yann Huynh	ccb51ff5b0	"Fixed argument passing in example yolov8" (#1004 ) "Fixed argument passing in example yolov8"	2023-06-18 14:29:39 -07:00
sehaj	775287ed91	Add yolov8 implementation (#806 ) * added SPPF module from yolov8 * added conv_block, bottleneck modules * cleaned modules * c2f example * spf changes * C2f * fixed and tested bottleneck * improved detect class * tested spf and conv * checked c2f * DFL structure * fixed dfl * added dist2bbox function * added dist2bbox function * added and tested make_anchors function for the head * keeping functions above * creating the detection head * fixing head * untested blocks a. scale_boxes b. clip_boxes c. xywh2xyxy d. box_iou * head works * structure fixx * added darknet (backbone) * yolov8 neck, and intialize bias function while detection * fixed spacing * yolov8 class, init bias, and fixed c2f * forward pass almost working * fixed net structure * init bias not needed, forward pass working * load weights boilerplate * load weights done? * all variants loading! * post process: clip_boxes, scale_boxes, xywh2xyxy, and box_iou(untested) * fix scale_boxes * box_iou fixed and tested * created the pre nms function * fix nms * fixed load weights, apparently the latest commit broke something, excluding num_batches_tracked * added letterbox and pre_tranform for pre_process function * fixed letterbox, pre_transform and added preprocess function * custom NMS done, integrated prepare_boxes and nms, improved box_iou * added postprocess function till parsing * added draw_bounding_boxes_and_save function * testing full flow * using fetch for class names * fixed make_anchors + all tinygrad now * added command line arguments, weight downloading * single image for now only * made draw boxes more efficient * made NMS functions efficient * made compute_transform better * v8 working now, inference is done * prints objects detected in console now * fixed image loading (pre processing) * batch post processing * created initial tests * fixes bounding box thickness AND added get_detected_classes_with_frequency function * cleaning for testing * two tests * added url option for image, removed need for specifiying arguments * tests complete, but lots on things are printed on screen by ultralytics * remove parse arguments * fixed weight location * fixed colours of classes, and black font when high brightness * minor changes * TODOs for later * removed use of torch, using .npz weights * fixed tests * one path for fetch * preprocess now in tinygrad, plus test fix for that * updated tests * fix tests * no class labels needed * Add files via upload * Update showcase.md * Update showcase.md * added safe tensors as weights, and tests fix for that * safe tensors test * using safe_load * using tinygrad functions now to load weights * update tests --------- Co-authored-by: r3sist-uniq <amanmatreja@gmail.com> Co-authored-by: r3sist <72573738+r3sist-uniq@users.noreply.github.com>	2023-06-16 18:55:19 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
cloud11665	e8a23d4331	there is a better way to do that! (#950 )	2023-06-06 15:23:30 -07:00
Diogo	3bb38c3518	limit split to 1 due to windows path containing : (#944 )	2023-06-06 10:27:54 -07:00
George Hotz	b78addf2f8	Whisper (#919 ) * no whispering yet * whispering * live whisper * small support	2023-06-03 18:55:14 -07:00
George Hotz	ed1963b899	Fast DiskTensor to other Tensor (#916 ) * make disktensors fast * loading * loader for sd and llama	2023-06-03 12:25:41 -07:00
George Hotz	d58586bb17	safetensors! (#903 ) * safetensors test * safe_save * load back with real safetensors * bugfix in device name. add simple torch_load * it works for llama, but it's slower... * mmap * no intermediate * load mmaped * readinto speed * not ready yet * revert that	2023-06-02 13:41:09 -07:00
George Hotz	8a928ed2f3	nn init matches torch (#901 )	2023-06-01 21:24:11 -07:00
Peter Ross	27845fd3a3	train_efficientnet: only import datasets.imagenet when IMAGENET is set (#899 ) make it work out of the box for new users. the default configuration of train_efficientnet is to use the smaller cifar dataset. import datasets.imagenet tries to open imagenet_class_index.json and will fail, unless user has already downloaded it.	2023-06-01 19:19:52 -07:00
George Hotz	1b42b4e1b8	fix examples/hlb_cifar10.py	2023-06-01 19:03:17 -07:00
wozeparrot	0fc4cf72a2	feat: add train scaffolding (#859 )	2023-05-30 07:10:40 -07:00
Jacky Lee	5d212864b5	Add MLPerf UNet3D model (#775 ) * Add ResNet inference test and cannon * Test with ResNet50 * test_car works with resnet fix * Add KiTS19 dataset * KiTS19: Implement iterate * No batch load for this dataset * Save results on iterate * Implement dice score * Add data prep and eval functions * Resolve shape issue * Conversion works but wrong values * Segfaults when load_from_pretrained is called * Fix segfault and assign properly * Final result generated, though very slow * Store and load final result to save time * Fix typo in finalize * Score computes * More bug fixes, dice score is very low * Working broken code * Assign output values to result * Getting a much higher score now * Fix dataset preprocessing * Mean DICE score of 88.5 * Ugh, typo * Attempt to reimplement model * Rename layers * Tiny model works, kinda * Accuracy? gone * Implement InstanceNorm and match torch * Test instance norm 2d and 3d * Combined input block with downsample block * Tiny model works, support strided convtranspose * Commands to download dataset * Clean up a bit * unet3d_v2 -> unet3d * Remove duplicated code * Oops, put tests back	2023-05-28 20:38:19 -07:00
Sohaib	65d09031f2	add retinanet with resnet backbone (#813 ) * add retinanet with resnet backbone * adds resnext to support loading retinanet pretrained on openimages * object detection post processing with numpy * data is downloaded and converted to coco format with fiftyone * data loading and mAP evaluation with pycocotools * remove fiftyone dep * * eval freq * fix model timing * del jit for last batch * faster accumulate	2023-05-28 20:20:16 -07:00
wozeparrot	67de3aa1de	Add mlperf bert model (#803 ) * feat: add mlperf bert model * feat: switch to nn.Embedding * clean+fix: fix formatting * feat: add simple downloader * feat: metrics * feat: don't actually need exact match * feat: doing a run * feat: set eps on the layernorms * clean+fix: cleaner impl + hopefully fixed * feat: move dataset initialization into iterate * feat: move tokenizer out of iterate * clean+fix: cleaner + working * clean: cleanup * fix: fix metrics * feat: need to use original bert gelu + download vocab * feat: make directory if it doesn't exist yet * feat: jit go brrr	2023-05-27 14:53:32 -07:00
wozeparrot	0dc333cfab	Promote Embedding to `nn` (#798 ) * feat: promote Embedding to nn * fix: fix failing test * feat: add test with jit * feat: rewrite embedding to no longer need stacked for loops * clean+fix: don't know how that happened	2023-05-25 18:39:45 -07:00
George Hotz	a968c4c3a4	Cleanup mlperf (#797 ) * improve factorization * cleanups	2023-05-25 11:36:43 -07:00
wozeparrot	01ae45a43c	Add mlperf RNN-T model (#782 ) * feat: initial rnn-t * feat: working with BS>1 * feat: add lstm test * feat: test passing hidden * clean: cleanup * feat: specify start * feat: way faster lstm & model * fix: default batch size * feat: optimization * fix: fix metrics * fix: fix feature splicing * feat: cleaner stacktime * clean: remove unused import * clean: remove extra prints * fix: fix tests and happy llvm * feat: have the librispeech dataset in its own dir * clean: unused variable * feat: no longer need numpy for the embedding + slightly more memory efficient lstm * fix: forgot to remove something that broke tests * feat: use relative paths * feat: even faster * feat: remove pointless transposes in StackTime * fix: correct forward * feat: switch to soundfile for loading and fix some leaks * feat: add comment about initial dataset setup * feat: jit more things * feat: default batch size back to 1 larger than 1 is broken again :( and even in the reference implementation it gives worse results	2023-05-25 00:41:21 -07:00
George Hotz	e0b2035023	fast imagenet eval, gets 76.14% across the set	2023-05-13 21:18:31 -07:00
George Hotz	b705510d5c	getting 77% on imagenet eval	2023-05-13 07:46:27 -07:00
George Hotz	810f03dafa	conv3d + unet3d (#772 ) * conv3d, needs test * test passes, padding wrong on unet * unet3d * no conv3d on images	2023-05-12 13:54:07 -07:00
George Hotz	46d419060b	start on mlperf models	2023-05-10 16:30:49 -07:00
Jacky Lee	d13629cb26	ResNet: match implementation with Nvidia and PyTorch (#770 ) * Match ResNet implementation with pytorch and nvidia * Reduce number of Epochs	2023-05-10 09:01:22 -07:00
George Hotz	e4db0c820f	hlb_cifar10 init from torch weights	2023-04-18 19:09:13 -07:00
George Hotz	732884653c	osx in hlb_cifar10_torch	2023-04-14 13:12:08 -07:00
George Hotz	584ee6f616	don't graph consts	2023-04-14 03:32:20 -07:00
George Hotz	9a39ebefde	hlb_cifar10_torch gets 80%	2023-04-14 02:47:03 -07:00
Jacky Lee	06ed958abd	Fix train_resnet example (#744 ) * Fix ResNet example * Scientific notation	2023-04-12 13:48:39 +05:30
Jacky Lee	7a45b989a1	Device: make GPU default and METAL/CUDA if possible (#732 ) * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * run github workflow * Fix logic to select default * pass if an error occurs * use separate function for try except	2023-04-04 09:41:52 +05:30
Jacky Lee	156640e90d	Permute examples (#731 ) * examples: use permute instead of transpose * Use transpose but change args	2023-03-29 05:07:06 +04:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	26a3888ab8	Fix llama 13B RAM usage (#710 )	2023-03-18 13:50:09 -07:00
Kirill	0fe5014b1f	Use pathlib (#711 ) * Use pathlib in llama * Use pathlib in stablediffusion	2023-03-18 13:49:21 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
Ayushman Kumar	e28bd11ff1	Cast Tensor data to float32 (#703 ) * Cast Tensor data to float32 * astype('float32') --> Tensor.randn()	2023-03-14 23:09:41 -07:00
Jacky Lee	5e820818e9	Cast image to float32 (#702 )	2023-03-14 08:13:19 -07:00
George Hotz	fe0e8a306f	jittable llama	2023-03-12 14:15:04 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Kirill	af7745073f	Add comments to SD (#686 ) * Add explanation for empty lambdas * Fix my_unpickle if pytorch_lightning is installed * oops	2023-03-12 10:56:49 -07:00
George Hotz	046b3952c3	get_state_dict	2023-03-11 23:46:53 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
George Hotz	61071f881a	fix bug, and add unit test to catch failure	2023-03-11 16:57:25 -08:00
George Hotz	3ec457248c	failing llama test	2023-03-11 16:28:10 -08:00
George Hotz	8aa63847c7	llama: up max tokens to 1000	2023-03-11 13:39:33 -08:00
George Hotz	5ea44cefcc	llama: add lexie personality	2023-03-11 10:23:33 -08:00
George Hotz	c908f911a7	llama defaults to metal on osx	2023-03-11 09:30:13 -08:00
George Hotz	5e1380df6a	profiling llama + cache is_contiguous	2023-03-11 08:23:21 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	8bf75a7fdd	fix stable diffusion and CI	2023-03-10 17:48:12 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00
jspieler	da7fb4b227	Fixed DDPG example (#667 )	2023-03-09 11:49:52 -08:00
George Hotz	c22afc52db	move the custom function example to a test	2023-03-08 10:05:04 -08:00
George Hotz	7d3b9d0e95	oops, things relied on that API. the global cache needs access to the ASTRunner class	2023-03-08 08:39:31 -08:00
George Hotz	4f957423c3	jitting custom ops + OPTLOCAL assignment bugfix	2023-03-08 08:30:37 -08:00
George Hotz	7285de41a1	tinygrad supports CUSTOM functions	2023-03-08 07:50:33 -08:00
Pankaj Doharey	9d97d97b26	Opens image in default viewer after saving. (#612 )	2023-03-03 17:28:49 -08:00
George Hotz	2e26286294	speed like you wouldn't believe (#626 ) * speed like you wouldn't believe * fix tests	2023-03-02 07:49:19 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	c4856aa193	fix yolo webcam	2023-02-26 17:24:05 -08:00
Jacky Lee	0f58c4c648	Cleanup yolo and remove stateless classes (#604 ) * Add AvgPool2d as a layer * Clean up a bit * Remove stateless layers in yolo_nn * More cleanup * Save label for test * Add test for YOLO * Test without cv2 * Don't fail if cv2 not installed * Better import * Fix image read * Use opencv :) * Don't download the file * Fix errors * Use same version * Set higher confidence * Why is the confidence so low? * Start over * Remove stateless layers * Remove extra lines * Revert changes * Save a few more lines	2023-02-26 16:55:21 -08:00
voidz	94bec40110	moved extras/jit.py -> tinygrad/jit.py (#599 ) * moved extras/jit.py to tinygrad/jit.py * fixed indent * removed tinygrad.helpers.DEBUG from jit.py	2023-02-25 08:32:33 -08:00
Benedikt Mandelkow	7348e9a6c6	add restrict qualifier to inputs in c backend (#593 ) * add restrict qualifier for clang backend convolution inputs/ outputs see https://godbolt.org/z/Tb9jMxWfx for generated assembly * enable more checks * inline fmax to motivate the compiler to inline some more * fix if else binding power	2023-02-25 08:32:21 -08:00
George Hotz	2e56a4793e	rename log_softmax, support dim, fix onnx Softmax	2023-02-24 10:11:24 -08:00
George Hotz	94ccab941e	compile_tensorflow: no cast required	2023-02-22 21:14:21 -08:00
George Hotz	135d0ddb78	compile_tensorflow: read weights from disk	2023-02-22 21:12:35 -08:00
George Hotz	0615dcffe7	compile_tensorflow: save the weights	2023-02-22 21:05:45 -08:00
George Hotz	c537fd0614	compile_tensorflow: add initialize and tests	2023-02-22 20:50:53 -08:00
George Hotz	dc914cde50	compile_tensorflow	2023-02-22 20:08:58 -08:00
George Hotz	76b4d0577d	yolov8 works up to the MaxPool	2023-02-22 19:32:13 -08:00
Mischa Untaga	14bb2c40a2	Fix yolov3 example (#577 )	2023-02-21 09:24:00 -08:00
George Hotz	d9fa47ecc9	use the TinyJit in the efficientnet runner, 200ms -> 20ms	2023-02-20 19:58:16 -08:00
George Hotz	714bf4b108	clang backend (#572 ) * start clang backend * mostly working * no group for reduce w clang * it compiles * compiles * a11y * minor fixups * formatting * add a test * rename test	2023-02-20 18:18:18 -08:00
Jacky Lee	cb679cd051	Fix weight initialization (#566 ) * Fix weight initialization * Use scaled_uniform in serious_mnist	2023-02-19 11:25:29 -08:00
Kirill	7944cfdadc	Remove Tensor.data (#565 )	2023-02-18 16:36:12 -08:00
Jacky Lee	7e8b0305f3	Fix mnist gan example (#563 )	2023-02-18 13:45:37 -08:00
Jacky Lee	9fd41632c6	Import get_parameters from tinygrad.nn (#559 ) * get_parameter is in optim * Update all imports for get_parameters * Clean up * use optim.get_paramters	2023-02-17 15:22:26 -08:00
Jacky Lee	e172f0087a	BatchNorm2D -> BatchNorm2d (#558 ) * BatchNorm2D -> BatchNorm2d * Fix typo	2023-02-16 12:31:49 -08:00
Jacky Lee	c35fcc6964	Replace phrase for prompt (#555 )	2023-02-12 09:04:44 -08:00
George Hotz	191c76cfd7	hlb_cifar10 torch version	2023-02-11 18:04:40 -08:00
George Hotz	9057d98d36	no lr decay in cifar. test this in torch tomorrow	2023-02-11 17:42:54 -08:00
George Hotz	dd7accb9cc	decay LR, little bugfix	2023-02-11 17:34:15 -08:00
George Hotz	ba3bf5bdf7	cifar stops learning	2023-02-11 17:21:42 -08:00
George Hotz	7d33f2d659	CL.CACHE is over, GlobalCounters.cache is it	2023-02-11 12:00:14 -08:00
George Hotz	9152bb5b4a	momentum support in SGD	2023-02-11 10:22:37 -08:00
George Hotz	031edd01e6	switch openpilot compile to TinyJit	2023-02-11 09:51:44 -08:00
jspieler	8f912c3966	added deep deterministic policy gradient example (#531 )	2023-02-11 10:10:46 -06:00
George Hotz	608fd730d3	put the JIT in extra	2023-02-11 00:35:18 -06:00
George Hotz	ed8ae7522a	tinyjit	2023-02-11 00:22:36 -06:00
George Hotz	4c90a15689	make the fake data actually learnable	2023-02-10 23:35:21 -06:00
George Hotz	07629d7476	fakedata and move to new cache	2023-02-10 23:32:31 -06:00
George Hotz	63fa7daf30	wrong place for CL	2023-02-10 23:22:24 -06:00
George Hotz	fed95119dc	CL.mem_used -> GlobalCounters.mem_used	2023-02-10 23:13:29 -06:00
Kirill	27154db99a	Downloads weights in examples/stable_diffusion.py (#537 ) * Downloads weights in examples/stable_diffusion.py * use download_file_if_not_exists in fetch * make consistent with previous NOCACHE behavior	2023-02-10 14:37:04 -06:00
Jacky Lee	f08187526f	Fix examples (#540 ) * Fix examples * Remove training in parameters * Simplify a bit * Remove extra import * Fix linter errors * factor out Device * NumPy-like semantics for Tensor.__getitem__ (#506) * Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None * Fixed pad2d * mypy doesn't know about mlops methods * normal python behavior for out-of-bounds slicing * type: ignore * inlined idxfix * added comment for __getitem__ * Better comments, better tests, and fixed bug in np.newaxis * update cpu and torch to hold buffers (#542) * update cpu and torch to hold buffers * save lines, and probably faster * Mypy fun (#541) * mypy fun * things are just faster * running fast * mypy is fast * compile.sh * no gpu hack * refactor ops_cpu and ops_torch to not subclass * make weak buffer work * tensor works * fix test failing * cpu/torch cleanups * no or operator on dict in python 3.8 * that was junk * fix warnings * comment and touchup * dyn add of math ops * refactor ops_cpu and ops_torch to not share code * nn/optim.py compiles now * Reorder imports * call mkdir only if directory doesn't exist --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: Mitchell Goff <mitchellgoffpc@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-02-10 12:09:37 -06:00
George Hotz	a5a55ac19e	GlobalCounters cache + assign in optim	2023-02-08 17:10:55 -06:00
George Hotz	3d63934995	refactor to keep cl in the runtime (#545 ) * refactor to keep cl in the runtime * fix thneed, rename cl to _cl * bugfix + _cuda * fix tests * thneed more correct	2023-02-08 16:46:09 -06:00
George Hotz	2844482a60	Mypy fun (#541 ) * mypy fun * things are just faster * running fast * mypy is fast * compile.sh * no gpu hack * refactor ops_cpu and ops_torch to not subclass * make weak buffer work * tensor works * fix test failing * cpu/torch cleanups * no or operator on dict in python 3.8 * that was junk * fix warnings * comment and touchup	2023-02-08 09:56:51 -06:00
George Hotz	f7291f6ca3	fixes big KOPT, breaks opencl (#505 ) * fixes big KOPT, breaks opencl * fix optimizer * KernelCache * oops, broke batchnorm * hack to fix it * fix llvm, less hacky gpu * disable the cache * cache just breaks things	2023-02-05 10:46:17 -08:00
James Roberts	db0a9b0a2d	Refactor CL.time_sum into GlobalCounters (#519 )	2023-02-01 20:13:56 -08:00
George Hotz	5e37f084db	stable diffusion: clean up constant folding	2023-02-01 12:53:16 -08:00
Jacky Lee	486f023e81	Rename Normalize and move to nn (#513 ) * Rename Normalize and move to nn * Match PyTorch for dim>1	2023-02-01 11:55:03 -08:00
Jacky Lee	799b3f185a	Refactor getenv into helpers (#508 ) * Refactor getenv into helpers * Remove unused os * Fix default value * Fix more defaults for CI * Fix bracket * Revert changes to openpilot/compile.py * Use getenv from helpers when possible	2023-01-31 15:09:09 -08:00
George Hotz	21f2af08d5	getenv + graphing	2023-01-30 19:15:03 -08:00
George Hotz	60ccddb58b	reenable SWAP	2023-01-30 17:32:02 -08:00
George Hotz	aea55eb196	found failing upcast	2023-01-30 16:12:56 -08:00
George Hotz	7ee0d99c70	CLCACHE	2023-01-30 14:02:06 -08:00
George Hotz	cccfea4b25	factor out KOPT code	2023-01-30 13:13:55 -08:00
George Hotz	de2c419fd4	make_pair and first attempt at hlb_cifar10	2023-01-30 11:07:23 -08:00
AllentDan	7b6b1f32b1	[Fix] fix typo: test_mnist -> datasets (#492 ) * test_mnist -> datasets * fix mnist_gan	2023-01-29 21:30:47 -08:00
George Hotz	2db272c7f7	Kernel Optimizer (#489 ) * kernel optimizer * 10x faster, but wrong. not good deal * move test -> extra * print x speedup * clcache * fix clcache + DEBUG * GFLOPS estimate * i==3	2023-01-29 17:15:00 -08:00
George Hotz	66da3bc3c0	reset the benchmark timer	2023-01-25 09:20:34 -08:00
George Hotz	487685919b	Revert "Rename Normalize and move to nn (#415 )" (#474 ) This reverts commit `d768acb6a9`.	2023-01-25 07:50:04 -08:00
Jacky Lee	d768acb6a9	Rename Normalize and move to nn (#415 ) * Rename Normalize and move to nn * Fix comparison to None error * Add test for GroupNorm * Rename test case * Flip parameters to match PyTorch * Increase error tolerance * Fix elementwise_affine on channels * Match arguments with PyTorch * Initialize weight and bias only when affine is true * Is this it? * A bit cleaner * Handle case where weight or bias is None	2023-01-25 07:47:59 -08:00
George Hotz	6d7658db12	delete opencl <celebration>	2023-01-24 14:18:35 -08:00
nogira	2e744ef2f2	confirmed (#449 ) w/ a bunch of print statements in the official model here: `ce05de2819/ldm/modules/diffusionmodules/openaimodel.py (L413)`	2023-01-07 08:41:06 -08:00
Drew Hintz	165fb4d631	remove redundant list comprehension from inside all. (#397 ) remove explicit inherit from object.	2022-10-13 09:58:35 -07:00
George Hotz	178ba50c03	some args for stable diffusion	2022-09-29 01:52:04 -04:00
George Hotz	a0d169eb59	fix efficientnet	2022-09-28 14:23:01 -07:00
George Hotz	60df954377	Fix weight init: this work? (#391 ) * this work? * glorot uniform * requies_grad broke * propagate the None correctly * so this weight init works * ahh, i think it's this * can't beat this * glorot is best for ae * remove comments	2022-09-25 16:46:33 -04:00
Jacky Lee	2c01a66265	Reshape dataset from fetch_mnist (#390 )	2022-09-24 21:16:29 -04:00
George Hotz	894a7cee79	forgot a few	2022-09-12 09:21:46 -07:00
George Hotz	801ecd4a07	cleanup clip tokenizer	2022-09-12 09:20:12 -07:00
Fernand Pajot	ff0da4c802	Added standalone CLIP tokenizer (#382 ) * Added standalone CLIP tokenizer. * Fixed empty phrase. * Truncating long prompts. * Keeping two slots for the start and end token. * Fixed empty phrase. * Using tokenizer for empty phrase. * Typo.	2022-09-12 09:12:55 -07:00
David Redmon	a1810c8617	update serious_mnist.py (#380 )	2022-09-11 13:37:40 -07:00
George Hotz	ecc1a0470d	add Linear to tinygrad.nn	2022-09-07 07:40:48 -07:00
George Hotz	896f9f74a9	hmm, need this with broadcast change	2022-09-06 16:54:01 -07:00
George Hotz	a18a6a0773	fix sd with TORCH=1	2022-09-06 16:51:16 -07:00
George Hotz	0516359af8	fix stupid OPENCL=1 OOM	2022-09-06 14:29:23 -07:00
George Hotz	f215534a64	1100 lines, but sane linter rules	2022-09-06 13:47:45 -07:00
George Hotz	682dc64430	works at work	2022-09-06 08:06:11 -07:00
George Hotz	d6f499fd69	improve opencl, why is it OOMing	2022-09-05 20:14:31 -07:00
George Hotz	0ba6179de7	stable diffusion in readme	2022-09-05 18:51:56 -07:00
George Hotz	c1d5af8b0c	stable diffusion cleanups	2022-09-05 18:34:13 -07:00
George Hotz	3728ef6d02	better alphas	2022-09-05 16:48:26 -07:00
George Hotz	0fda854b3e	other prompt example	2022-09-05 16:14:16 -07:00
George Hotz	16cb4290c4	cat horse winning ❗	2022-09-05 16:05:14 -07:00
George Hotz	1043fa067a	it renders something	2022-09-05 15:52:14 -07:00
George Hotz	5a685b93ac	brown img	2022-09-05 15:20:18 -07:00
George Hotz	98d6264987	all models match	2022-09-05 12:27:54 -07:00
George Hotz	b8bd34b5d2	fix last bug in unet probz	2022-09-05 11:32:44 -07:00
George Hotz	3df67aa0af	fix transformer bugs	2022-09-05 11:26:32 -07:00
George Hotz	2ed3bb6223	clip model is running	2022-09-05 11:26:32 -07:00
George Hotz	1a54ea2417	runs on torch cpu	2022-09-04 12:06:42 -07:00
George Hotz	9590d92750	stable diffusion compiles (add no_init)	2022-09-04 11:40:50 -07:00
George Hotz	172683c314	work	2022-09-04 11:21:09 -07:00
George Hotz	c2a030fe55	one liner that's more clear	2022-09-03 16:08:48 -07:00
George Hotz	4a3ed58edb	more readable actually	2022-09-03 16:00:35 -07:00
George Hotz	633f31dc73	easier to read	2022-09-03 15:53:58 -07:00
George Hotz	6578e08919	cleanups for Mid	2022-09-03 15:50:33 -07:00
George Hotz	852de7c66c	remove ugly parens	2022-09-03 15:41:37 -07:00
George Hotz	6b190c2fa5	stable diffusion works	2022-09-03 13:55:36 -07:00
George Hotz	947e10dab0	yolo	2022-09-03 12:39:48 -07:00
George Hotz	033a3ecccf	found tinygrad bug	2022-09-03 12:32:43 -07:00
George Hotz	114728d363	torch bs	2022-09-03 11:57:23 -07:00
George Hotz	356732515b	stable_diffusion: add attn and layernorm	2022-09-03 11:02:27 -07:00
George Hotz	4dadd95e3c	fix tests hopefully, more stable diffusion	2022-09-03 10:38:31 -07:00
George Hotz	c01a8c5c2d	stable diffusion start	2022-09-03 10:08:42 -07:00
George Hotz	b132de677d	tinygrad.nn (#367 ) * tinygrad.nn * flake8 * working on pylint * more pylint * more pylint * pylint passes * networkx * mypy can't infer that type * junk	2022-08-18 07:41:00 -07:00
George Hotz	acbeaf0ba9	adam in benchmark_train_efficientnet	2022-07-19 09:33:07 -07:00
George Hotz	d985217fa4	skip reduce noops	2022-07-16 07:47:43 -07:00
George Hotz	5e46561f7e	no_grad = NOT backward	2022-07-10 20:54:57 -07:00
George Hotz	d5d9cffe7c	training param for batchnorm	2022-07-04 13:28:03 -07:00
George Hotz	34f43ea10e	LAZY and CLCACHE are defaults	2022-07-04 13:09:15 -07:00
George Hotz	b7afd83267	track cl mem used	2022-07-04 12:19:00 -07:00
George Hotz	d5de8452c6	dashed loadops	2022-07-04 09:50:56 -07:00
George Hotz	7276f8d6bf	improve constant folding, detach before moving tensor	2022-07-02 15:29:40 -07:00
George Hotz	0cb99d72e9	NUM=-1 is a small efficientnet for small people	2022-07-02 15:11:51 -07:00
George Hotz	8cf1aed0f4	don't track_running_stats, parameters must require_grad	2022-07-02 14:38:45 -07:00
George Hotz	f607f18006	fix backward	2022-06-25 00:00:53 -07:00
George Hotz	ec30f0402f	improve benchmark_train_efficientnet	2022-06-24 23:46:38 -07:00
George Hotz	d748353ce5	err, okay, a bit more off	2022-06-24 22:44:57 -07:00

... 2 3 4 5 6 ...

479 Commits (deepcrayon)