Kunwar Raj Singh
8391648822
Over 90% on CIFAR with examples/hlb_cifar10.py ( #1073 )
...
* fix eval, lr decay, best eval
* 82.27
* 82.64
* 82.79, reproducable
* add lr sched, 85.26
* 87.42
* 87.94
* 87.42
* tta with flip
* training flip aug
* refactor
* using Tensor for LR is faster
* 89.5
* refactor, flip only train set
* 90.01
* 90.64
* eval jit
* refactor
* only JIT model
* fix eval JIT
* fix eval JIT
* 90.82
* STEPS=900 reaches 90.22
* TTA envvar
* TTA default 0
* fully jit training
* refactor optim
* fix sched
* add label smoothing
* param changes
* patial gelu
* OneCycle with pause
* gelu maybe works
* 90.12
* remove pause lr
* maybe fix lr schedulers
* scheduler test passing
* comments
* try mixup
* shuffle!
* add back the missing last eval
* fix shuffle bugs
* add mixup prob
* fix mixup prob
* 90.19
* correct mixup
* correct mixup
* correct mixup
* 90.24
* 90.33
* refactor, add type hints
* add gradient clipping
* maybe fix test
* full JIT
* back to relu for now
* pass mixup prob as param
* add typehints
* maybe CI works
* try erf gelu
* CI, types
* remove useless import/
* refactor optim
* refactor optim
* try leakyrelu
* try celu
* gelu
* 90.67
* remove grad clip
* remove grad clip tests
* revert params
* add test for OneCycleLR
* 90.62
* fix eval timing
* fix eval timing again
* so where i calculate mixup_prob matters
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-06 20:46:22 -07:00