1
0
Fork 0

CI < 5 minutes (#1252)

* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
pull/991/head
cheeetoo 2023-07-23 15:00:56 -05:00 committed by GitHub
parent 47f9d82722
commit a0965ee198
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
23 changed files with 237 additions and 226 deletions

View File

@ -18,6 +18,11 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: linting
- name: Install dependencies
run: pip install -e '.[linting,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Repo line count
@ -31,12 +36,12 @@ jobs:
- name: Run mypy
run: mypy tinygrad/ --ignore-missing-imports --check-untyped-defs --explicit-package-bases --warn-unreachable
- name: Install SLOCCount
run: sudo apt-get install sloccount
run: sudo apt install sloccount
- name: Check <5000 lines
run: sloccount tinygrad test examples extra; if [ $(sloccount tinygrad | sed -n 's/.*Total Physical Source Lines of Code (SLOC)[ ]*= \([^ ]*\).*/\1/p' | tr -d ',') -gt 5000 ]; then exit 1; fi
testcpu:
name: CPU Tests
testcpuimagenet:
name: CPU and ImageNet to C Tests
runs-on: ubuntu-latest
timeout-minutes: 20
@ -47,6 +52,11 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: testing
- name: Install Dependencies
run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Test Docs
@ -54,49 +64,11 @@ jobs:
- name: Test Quickstart
run: awk '/```python/{flag=1;next}/```/{flag=0}flag' docs/quickstart.md > quickstart.py && PYTHONPATH=. python3 quickstart.py
- name: Run Pytest
run: python -m pytest -s -v -n=auto test/
run: python -m pytest -n=auto test/ -k "not (test_efficientnet and models/test_train.py)"
- name: Fuzz Test symbolic
run: DEBUG=1 python test/external/fuzz_symbolic.py
run: python test/external/fuzz_symbolic.py
- name: Fuzz Test shapetracker
run: PYTHONPATH="." DEBUG=1 python test/external/fuzz_shapetracker.py
testwebgpu:
name: WebGPU Tests
runs-on: macos-13
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: pip install -e '.[testing,webgpu]' --extra-index-url https://download.pytorch.org/whl/cpu
# - name: Set Env
# run: printf "WEBGPU=1\nWGPU_BACKEND_TYPE=D3D12\n" >> $GITHUB_ENV
- name: Run Pytest
run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m pytest -s -v -n=auto test/test_ops.py test/test_speed_v_torch.py test/test_nn.py test/test_jit.py test/test_randomness.py test/test_tensor.py test/test_assign.py test/test_conv.py test/test_nn.py test/test_custom_function.py test/test_conv_shapetracker.py
- name: Build WEBGPU Efficientnet
run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m examples.webgpu.compile_webgpu
# - name: Install Puppeteer
# run: npm install puppeteer
# - name: Run Efficientnet
# run: node test/test_webgpu.js
testimagenet:
name: ImageNet to C Compile Test
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: pip install -e .
run: PYTHONPATH="." python test/external/fuzz_shapetracker.py
- name: Compile EfficientNet to C
run: PYTHONPATH="." CLANG=1 python3 examples/compile_efficientnet.py > recognize.c
- name: Compile C to native
@ -104,44 +76,6 @@ jobs:
- name: Test EfficientNet
run: curl https://media.istockphoto.com/photos/hen-picture-id831791190 | ./recognize | grep hen
testllvm:
name: LLVM Tests
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: pip install -e '.[llvm,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Run Pytest
run: ENABLE_METHOD_CACHE=1 LLVM=1 python -m pytest -s -v -n=auto test/
testclang:
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
runs-on: ${{ matrix.os }}
name: CLANG Tests ${{ matrix.os }} (w method cache)
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Set env
run: printf "CI=1\nCLANG=1\nENABLE_METHOD_CACHE=1" >> $GITHUB_ENV
- name: Run Pytest
run: python -m pytest -s -v -n=auto test/
testtorch:
name: Torch Tests
runs-on: ubuntu-latest
@ -154,79 +88,72 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: testing
- name: Install Dependencies
run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Run Pytest
run: TORCH=1 python -m pytest -s -v -n=auto test/
run: TORCH=1 python -m pytest -n=auto test/
- name: Run ONNX
run: TORCH=1 python -m pytest test/external/external_test_onnx_backend.py --tb=no --disable-warnings || true
testgpu:
name: GPU Tests
runs-on: ubuntu-20.04
timeout-minutes: 20
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Update packages
run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt-get update
- name: Install OpenCL
#run: sudo apt-get install -y pocl-opencl-icd
run: sudo apt-get install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Run Optimizer Test (OPT 2 and 3)
run: |
PYTHONPATH="." OPT=2 GPU=1 python test/external/external_test_opt.py
PYTHONPATH="." OPT=3 GPU=1 python test/external/external_test_opt.py
- name: Run Pytest (default)
run: GPU=1 python -m pytest -s -v -n=auto test/
run: TORCH=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py --tb=no --disable-warnings || true
testopencl:
name: openpilot (OpenCL) Test
strategy:
matrix:
task: [optimage, openpilot]
name: ${{ matrix.task=='optimage'&&'GPU OPT and IMAGE Tests'||'openpilot (OpenCL) Tests'}}
runs-on: ubuntu-20.04
timeout-minutes: 20
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Update packages
run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt-get update
- name: Install OpenCL
#run: sudo apt-get install -y pocl-opencl-icd
run: sudo apt-get install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Test openpilot model compile and size
run: |
DEBUG=2 ALLOWED_KERNEL_COUNT=199 FLOAT16=1 DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
python3 -c 'import os; assert os.path.getsize("/tmp/output.thneed") < 100_000_000'
- name: Test GPU IMAGE ops
run: |
GPU=1 IMAGE=1 python3 test/test_ops.py
FORWARD_ONLY=1 GPU=1 IMAGE=2 python3 test/test_ops.py
- name: Test openpilot model correctness (float32)
run: DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
- name: Test tensor core ops
run: GPU=1 TC=2 python3 test/test_ops.py
- name: Checkout Code
uses: actions/checkout@v3
- name: Update packages
run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
- name: Install OpenCL
#run: sudo apt-get install -y pocl-opencl-icd
run: sudo apt install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: testing
- name: Install Dependencies
run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- if: ${{ matrix.task == 'optimage' }}
name: Run Optimizer Test (OPT 2 and 3)
run: |
PYTHONPATH="." OPT=2 GPU=1 python -m pytest -n=auto test/external/external_test_opt.py
PYTHONPATH="." OPT=3 GPU=1 python -m pytest -n=auto test/external/external_test_opt.py
- if: ${{ matrix.task == 'optimage'}}
name: Test GPU IMAGE ops
run: |
GPU=1 IMAGE=1 python3 -m pytest -n=auto test/test_ops.py
FORWARD_ONLY=1 GPU=1 IMAGE=2 python3 -m pytest -n=auto test/test_ops.py
- if: ${{ matrix.task == 'openpilot' }}
name: Test openpilot model compile and size
run: |
DEBUG=2 ALLOWED_KERNEL_COUNT=199 FLOAT16=1 DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
python3 -c 'import os; assert os.path.getsize("/tmp/output.thneed") < 100_000_000'
- if: ${{ matrix.task == 'openpilot' }}
name: Test openpilot model correctness (float32)
run: DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
- if: ${{ matrix.task == 'openpilot' }}
name: Test tensor core ops
run: GPU=1 TC=2 python3 -m pytest -n=auto test/test_ops.py
testmetal:
name: Metal Tests
testmetalwebgpu:
name: Metal and WebGPU Tests
runs-on: macos-13
timeout-minutes: 20
@ -237,19 +164,27 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/Library/Caches/pip
key: metalwebgpu
- name: Install Dependencies
run: pip install -e '.[metal,testing]'
run: pip install -e '.[metal,webgpu,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Test LLaMA compile speed
run: PYTHONPATH="." METAL=1 python3 test/external/external_test_speed_llama.py
#- name: Run dtype test
# run: DEBUG=4 METAL=1 python -m pytest test/test_dtype.py
# dtype test has issues on test_half_to_int8
- name: Run ops test
- name: Run metal ops test
run: DEBUG=2 METAL=1 python -m pytest test/test_ops.py
- name: Run JIT test
run: DEBUG=2 METAL=1 python -m pytest test/test_jit.py
# TODO: why not testing the whole test/?
- name: Run webgpu pytest
run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m pytest -n=auto -m 'webgpu'
- name: Build WEBGPU Efficientnet
run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m examples.webgpu.compile_webgpu
testdocker:
name: Docker Test
@ -264,58 +199,73 @@ jobs:
- name: Test Docker
run: docker run --rm tinygrad /usr/bin/env python3 -c "from tinygrad.tensor import Tensor; print(Tensor.eye(3).numpy())"
tests:
strategy:
matrix:
backend: [llvm, clang, gpu, cuda]
testcuda:
name: (emulated) cuda test
runs-on: ubuntu-22.04
name: Tests on (${{ matrix.backend }})
runs-on: ${{ matrix.backend == 'gpu' && 'ubuntu-20.04' || matrix.backend=='clang'&&'windows-latest'|| 'ubuntu-latest' }}
timeout-minutes: 20
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Update packages
run: |
export DEBIAN_FRONTEND=noninteractive
sudo apt-get update -y
- name: Install packages
run: sudo apt-get install -y --no-install-recommends git g++ cmake ninja-build llvm-15-dev libz-dev libglew-dev flex bison libfl-dev libboost-thread-dev libboost-filesystem-dev nvidia-cuda-toolkit-gcc
- name: Cache gpuocelot
id: cache-build
uses: actions/cache@v3
env:
cache-name: cache-gpuocelot-build
with:
path: ${{ github.workspace }}/gpuocelot/ocelot/
key: ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
restore-keys: |
ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
- if: ${{ steps.cache-build.outputs.cache-hit != 'true' }}
name: Clone gpuocelot
uses: actions/checkout@v3
with:
repository: gpuocelot/gpuocelot
ref: 19626fc00b6ee321638c3111074269c69050e091
path: ${{ github.workspace }}/gpuocelot
submodules: true
- if: ${{ steps.cache-build.outputs.cache-hit != 'true' }}
name: Compile gpuocelot
run: |
cd ${{ github.workspace }}/gpuocelot/ocelot
mkdir build
cd build
cmake .. -Wno-dev -G Ninja -DOCELOT_BUILD_TOOLS=OFF
ninja
- name: Install gpuocelot
run: |
cd ${{ github.workspace }}/gpuocelot/ocelot/build
sudo ninja install
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
cache: 'pip'
cache-dependency-path: setup.py
- name: Install tinygrad dependencies
run: pip install -e '.[testing, cuda]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Run pytest
run: FORWARD_ONLY=1 JIT=1 OPT=2 CUDA=1 CUDACPU=1 python -m pytest -s -v -n=auto test --ignore=test/external --ignore=test/models --ignore=test/test_speed_v_torch.py --ignore=test/test_specific_conv.py --ignore=test/test_net_speed.py --ignore=test/test_nn.py -k "not half"
- name: Checkout Code
uses: actions/checkout@v3
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Cache pip
uses: actions/cache@v3
with:
path: ${{ matrix.backend=='clang'&&'~\AppData\Local\pip\cache'||'~/.cache/pip' }}
key: ${{ matrix.backend }}
- name: Set env
run: printf "${{ matrix.backend == 'llvm' && 'ENABLE_METHOD_CACHE=1\nLLVM=1' || matrix.backend == 'clang' && 'CLANG=1\nENABLED_METHOD_CACHE=1' || matrix.backend == 'gpu' && 'GPU=1' || matrix.backend == 'cuda' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\n'}}" >> $GITHUB_ENV
- name: Install packages (gpu)
if: matrix.backend == 'gpu'
run: |
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update && \
sudo apt install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
- name: Install packages (cuda)
if: matrix.backend == 'cuda'
run: |
export DEBIAN_FRONTEND=noninteractive
sudo apt update -y && \
sudo apt install -y --no-install-recommends git g++ cmake ninja-build llvm-15-dev zlib1g-dev libglew-dev flex bison libfl-dev libboost-thread-dev libboost-filesystem-dev nvidia-cuda-toolkit-gcc
- name: Cache gpuocelot
if: matrix.backend == 'cuda'
id: cache-build
uses: actions/cache@v3
env:
cache-name: cache-gpuocelot-build
with:
path: ${{ github.workspace }}/gpuocelot/ocelot/
key: ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
restore-keys: |
ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
- name: Clone/compile gpuocelot
if: matrix.backend == 'cuda' && steps.cache-build.outputs.cache-hit != 'true'
run: |
git clone --recurse-submodules https://github.com/gpuocelot/gpuocelot.git ${{ github.workspace }}/gpuocelot
cd ${{ github.workspace }}/gpuocelot/ocelot
git checkout 19626fc00b6ee321638c3111074269c69050e091
mkdir build
cd build
cmake .. -Wno-dev -G Ninja -DOCELOT_BUILD_TOOLS=OFF
ninja
- name: Install gpuocelot
if: matrix.backend == 'cuda'
run: |
cd ${{ github.workspace }}/gpuocelot/ocelot/build
sudo ninja install
- name: Install dependencies
run: pip install -e '.[testing${{matrix.backend=='llvm'&&',llvm'||matrix.backend=='cuda'&&',cuda'||''}}]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Run pytest (not cuda)
if: matrix.backend!='cuda'
run: python -m pytest -n=auto test/ -k '${{matrix.backend=='llvm'&&'not (test_nn.py and test_conv_transpose2d)'||'test'}}' -m 'not exclude_${{matrix.backend}}'
- name: Run pytest (cuda)
if: matrix.backend=='cuda'
run: python -m pytest -n=auto test/ -k 'not (half or test_efficientnet_safetensors) and not (test_conv2d and test_tensor.py)' -m 'not exclude_cuda' --ignore=test/external --ignore=test/models

2
pytest.ini 100644
View File

@ -0,0 +1,2 @@
[pytest]
markers = ['exclude_cuda', 'exclude_gpu', 'exclude_clang', 'webgpu']

View File

@ -73,5 +73,5 @@ class TestYOLOv8(unittest.TestCase):
np.testing.assert_allclose(onnx_output[0], tiny_output.cpu().numpy(), atol=5e-4, rtol=0.025)
if __name__ == '__main__':
unittest.main()
unittest.main()

View File

@ -7,6 +7,9 @@ from tinygrad.nn.optim import Adam
from extra.lr_scheduler import MultiStepLR, ReduceLROnPlateau, CosineAnnealingLR, OneCycleLR
from extra.training import train, evaluate
from extra.datasets import fetch_mnist
import pytest
pytestmark = [pytest.mark.exclude_cuda, pytest.mark.exclude_gpu]
np.random.seed(1337)
Tensor.manual_seed(1337)

View File

@ -6,13 +6,14 @@ from tinygrad.state import get_parameters, get_state_dict
from tinygrad.nn import optim, Linear, Conv2d, BatchNorm2d
from tinygrad.tensor import Tensor
from extra.datasets import fetch_mnist
from tinygrad.helpers import CI
def compare_tiny_torch(model, model_torch, X, Y):
Tensor.training = True
model_torch.train()
model_state_dict = get_state_dict(model)
for k,v in model_torch.named_parameters():
print(f"initting {k} from torch")
if not CI: print(f"initting {k} from torch")
model_state_dict[k].assign(Tensor(v.detach().numpy())).realize()
optimizer = optim.SGD(get_parameters(model), lr=0.01)
@ -23,11 +24,11 @@ def compare_tiny_torch(model, model_torch, X, Y):
out = model(X)
loss = (out * Y).mean()
print(loss.realize().numpy())
if not CI: print(loss.realize().numpy())
out_torch = model_torch(torch.Tensor(X.numpy()))
loss_torch = (out_torch * torch.Tensor(Y.numpy())).mean()
print(loss_torch.detach().numpy())
if not CI: print(loss_torch.detach().numpy())
# assert losses match
np.testing.assert_allclose(loss.realize().numpy(), loss_torch.detach().numpy(), atol=1e-4)
@ -41,7 +42,7 @@ def compare_tiny_torch(model, model_torch, X, Y):
for k,v in list(model_torch.named_parameters())[::-1]:
g = model_state_dict[k].grad.numpy()
gt = v.grad.detach().numpy()
print("testing grads", k)
if not CI: print("testing grads", k)
np.testing.assert_allclose(g, gt, atol=1e-3, err_msg=f'grad mismatch {k}')
# take the steps
@ -50,7 +51,7 @@ def compare_tiny_torch(model, model_torch, X, Y):
# assert weights match (they don't!)
for k,v in model_torch.named_parameters():
print("testing weight", k)
if not CI: print("testing weight", k)
np.testing.assert_allclose(model_state_dict[k].numpy(), v.detach().numpy(), atol=1e-3, err_msg=f'weight mismatch {k}')
def get_mnist_data():

View File

@ -6,6 +6,9 @@ from tinygrad.tensor import Tensor, Device
from tinygrad.nn import optim, BatchNorm2d
from extra.training import train, evaluate
from extra.datasets import fetch_mnist
import pytest
pytestmark = [pytest.mark.exclude_gpu, pytest.mark.exclude_clang]
# load the mnist dataset
X_train, Y_train, X_test, Y_test = fetch_mnist()

View File

@ -8,6 +8,10 @@ import onnx
from extra.utils import fetch, temp
from extra.onnx import get_run_onnx
from tinygrad.tensor import Tensor
from tinygrad.helpers import CI
import pytest
pytestmark = [pytest.mark.exclude_gpu, pytest.mark.exclude_clang]
def run_onnx_torch(onnx_model, inputs):
import torch
@ -48,22 +52,24 @@ class TestOnnxModel(unittest.TestCase):
mt2 = time.monotonic()
tinygrad_out = tinygrad_out.numpy()
et = time.monotonic()
print(f"ran openpilot model in {(et-st)*1000.0:.2f} ms, waited {(mt2-mt)*1000.0:.2f} ms for realize, {(et-mt2)*1000.0:.2f} ms for GPU queue")
if not CI: print(f"ran openpilot model in {(et-st)*1000.0:.2f} ms, waited {(mt2-mt)*1000.0:.2f} ms for realize, {(et-mt2)*1000.0:.2f} ms for GPU queue")
import cProfile
import pstats
inputs = get_inputs()
pr = cProfile.Profile(timer=time.perf_counter_ns, timeunit=1e-6)
pr.enable()
if not CI:
import cProfile
import pstats
inputs = get_inputs()
pr = cProfile.Profile(timer=time.perf_counter_ns, timeunit=1e-6)
pr.enable()
tinygrad_out = run_onnx(inputs)['outputs']
tinygrad_out.realize()
tinygrad_out = tinygrad_out.numpy()
pr.disable()
stats = pstats.Stats(pr)
stats.dump_stats(temp("net.prof"))
os.system(f"flameprof {temp('net.prof')} > {temp('prof.svg')}")
ps = stats.sort_stats(pstats.SortKey.TIME)
ps.print_stats(30)
if not CI:
pr.disable()
stats = pstats.Stats(pr)
stats.dump_stats(temp("net.prof"))
os.system(f"flameprof {temp('net.prof')} > {temp('prof.svg')}")
ps = stats.sort_stats(pstats.SortKey.TIME)
ps.print_stats(30)
def test_openpilot_model(self):
dat = fetch(OPENPILOT_MODEL)

View File

@ -11,6 +11,9 @@ from models.efficientnet import EfficientNet
from models.transformer import Transformer
from models.vit import ViT
from models.resnet import ResNet18
import pytest
pytestmark = pytest.mark.exclude_gpu
BS = getenv("BS", 2)

View File

@ -5,6 +5,9 @@ from tinygrad.tensor import Tensor
from tinygrad.lazy import LAZY
from tinygrad.ops import GlobalCounters
from tinygrad.graph import nm
import pytest
pytestmark = pytest.mark.webgpu
N = 200 # has to be bigger than the cache to fail

View File

@ -1,6 +1,9 @@
import unittest
import numpy as np
from tinygrad.tensor import Tensor
import pytest
pytestmark = [pytest.mark.exclude_cuda, pytest.mark.webgpu]
class TestConv(unittest.TestCase):
def test_simple(self):

View File

@ -3,6 +3,9 @@ import unittest
from tinygrad.tensor import Tensor, Device
from tinygrad.nn import Conv2d
from tinygrad.ops import GlobalCounters
import pytest
pytestmark = pytest.mark.webgpu
#@unittest.skipUnless(Device.DEFAULT == "GPU", "Only GPU supports cache")
@unittest.skip("with JIT changes, you only get the raw buffer")

View File

@ -11,6 +11,9 @@ from tinygrad.helpers import prod, dtypes
from tinygrad.lazy import LazyBuffer, create_lazybuffer, Device
from tinygrad.ops import ASTRunner
from tinygrad.shape.shapetracker import ShapeTracker
import pytest
pytestmark = pytest.mark.webgpu
# we don't always have GPU support, so the type signature is the abstract CompiledBuffer instead of GPUBuffer
def atan2_gpu(ret:LazyBuffer, a:LazyBuffer, b:LazyBuffer):

View File

@ -1,6 +1,6 @@
import unittest
import numpy as np
from tinygrad.helpers import getenv, DType, DEBUG
from tinygrad.helpers import getenv, DType, DEBUG, CI
from tinygrad.lazy import Device
from tinygrad.tensor import Tensor, dtypes
from typing import List, Optional

View File

@ -3,6 +3,9 @@ import unittest
import numpy as np
from tinygrad.tensor import Tensor, Device
from tinygrad.jit import TinyJit, JIT_SUPPORTED_DEVICE
import pytest
pytestmark = pytest.mark.webgpu
# NOTE: METAL fails, might be platform and optimization options dependent.
@unittest.skipUnless(Device.DEFAULT in JIT_SUPPORTED_DEVICE and Device.DEFAULT not in ["METAL", "WEBGPU"], f"no JIT on {Device.DEFAULT}")

View File

@ -5,6 +5,9 @@ import pstats
import unittest
import torch
from tinygrad.tensor import Tensor, Device
import pytest
pytestmark = [pytest.mark.exclude_cuda, pytest.mark.exclude_gpu, pytest.mark.exclude_clang]
def start_profile():
import time

View File

@ -7,6 +7,9 @@ from tinygrad.jit import TinyJit
from tinygrad.tensor import Tensor, Device
from tinygrad.nn import BatchNorm2d, Conv1d, ConvTranspose1d, Conv2d, ConvTranspose2d, Linear, GroupNorm, LayerNorm, LayerNorm2d, Embedding, InstanceNorm
import torch
import pytest
pytestmark = [pytest.mark.exclude_cuda, pytest.mark.webgpu]
class TestNN(unittest.TestCase):

View File

@ -4,8 +4,15 @@ import math
import numpy as np
import unittest
from tinygrad.tensor import Tensor
from tinygrad.helpers import getenv, IMAGE, DEBUG
from tinygrad.helpers import getenv, IMAGE, DEBUG, CI
from tinygrad.lazy import Device
import pytest
pytestmark = pytest.mark.webgpu
if CI:
import warnings
warnings.filterwarnings("ignore", message="Non-empty compiler output encountered")
FORWARD_ONLY = getenv("FORWARD_ONLY", 0)
PRINT_TENSORS = getenv("PRINT_TENSORS", 0)
@ -49,7 +56,7 @@ def helper_test_op(shps, torch_fxn, tinygrad_fxn=None, atol=1e-6, rtol=1e-3, gra
for i, (t, tt) in enumerate(zip(ts, tst)):
compare(f"backward pass tensor {i}", tt.grad.numpy(), t.grad.detach().numpy(), atol=grad_atol, rtol=grad_rtol)
print("\ntesting %40r torch/tinygrad fp: %.2f / %.2f ms bp: %.2f / %.2f ms " % (shps, torch_fp*1000, tinygrad_fp*1000, torch_fbp*1000, tinygrad_fbp*1000), end="")
if not CI: print("\ntesting %40r torch/tinygrad fp: %.2f / %.2f ms bp: %.2f / %.2f ms " % (shps, torch_fp*1000, tinygrad_fp*1000, torch_fbp*1000, tinygrad_fbp*1000), end="")
def prepare_test_op(a, b, shps, vals):
torch.manual_seed(0)
@ -68,7 +75,7 @@ class TestOps(unittest.TestCase):
with self.assertRaises(expected) as tinygrad_cm:
tinygrad_fxn(*tst)
if exact: self.assertEqual(str(torch_cm.exception), str(tinygrad_cm.exception))
print("\ntesting %40r torch/tinygrad exception: %s / %s" % (shps, torch_cm.exception, tinygrad_cm.exception), end="")
if not CI: print("\ntesting %40r torch/tinygrad exception: %s / %s" % (shps, torch_cm.exception, tinygrad_cm.exception), end="")
def test_full_like(self):
a = Tensor([[1,2,3],[4,5,6]])

View File

@ -5,6 +5,9 @@ import torch
import unittest
from tinygrad.tensor import Tensor
from tinygrad.nn.optim import Adam, SGD, AdamW
import pytest
pytestmark = pytest.mark.exclude_cuda
np.random.seed(1337)
x_init = np.random.randn(1,4).astype(np.float32)

View File

@ -4,6 +4,9 @@ import numpy as np
import torch
from tinygrad.tensor import Tensor
import tinygrad.nn as nn
import pytest
pytestmark = pytest.mark.webgpu
# https://gist.github.com/devries/11405101
def ksprob(a):

View File

@ -2,8 +2,11 @@ import unittest
from tinygrad.tensor import Tensor
from tinygrad.helpers import dtypes
from tinygrad.lazy import Device
import pytest
# similar to test/external/external_test_gpu_ast.py, but universal
pytestmark = pytest.mark.exclude_cuda
class TestSpecific(unittest.TestCase):
# from openpilot

View File

@ -14,8 +14,11 @@ from tinygrad.lazy import Device
from tinygrad.ops import GlobalCounters
from tinygrad.tensor import Tensor
from tinygrad.nn import Conv2d
from tinygrad.helpers import colored, getenv, DEBUG
from tinygrad.helpers import colored, getenv, DEBUG, CI
from tinygrad.jit import TinyJit
import pytest
pytestmark = [pytest.mark.exclude_cuda, pytest.mark.exclude_gpu, pytest.mark.exclude_clang, pytest.mark.webgpu]
IN_CHANS = [int(x) for x in getenv("IN_CHANS", "4,16,64").split(",")]
@ -93,7 +96,7 @@ def helper_test_generic(name, f1, f1_args, f2, f2_args):
desc = "faster" if et_torch > et_tinygrad else "slower"
flops = save_ops*1e-6
mem = save_mem*1e-6
print(f"\r{name:42s} {et_torch:7.2f} ms ({flops/et_torch:8.2f} GFLOPS {mem/et_torch:8.2f} GB/s) in torch, {et_tinygrad:7.2f} ms ({flops/et_tinygrad:8.2f} GFLOPS {mem/et_tinygrad:8.2f} GB/s) in tinygrad, {colorize_float(et_tinygrad/et_torch)} {desc} {flops:10.2f} MOPS {mem:8.2f} MB")
if not CI: print(f"\r{name:42s} {et_torch:7.2f} ms ({flops/et_torch:8.2f} GFLOPS {mem/et_torch:8.2f} GB/s) in torch, {et_tinygrad:7.2f} ms ({flops/et_tinygrad:8.2f} GFLOPS {mem/et_tinygrad:8.2f} GB/s) in tinygrad, {colorize_float(et_tinygrad/et_torch)} {desc} {flops:10.2f} MOPS {mem:8.2f} MB")
np.testing.assert_allclose(val_tinygrad, val_torch, atol=1e-4, rtol=1e-3)
def helper_test_conv(bs, in_chans, out_chans, kernel_size, img_size_y, img_size_x):

View File

@ -6,6 +6,9 @@ import itertools
from tinygrad.tensor import Tensor, Device
from tinygrad.helpers import dtypes
from extra.gradcheck import numerical_jacobian, jacobian, gradcheck
import pytest
pytestmark = pytest.mark.webgpu
x_init = np.random.randn(1,3).astype(np.float32)
U_init = np.random.randn(3,3).astype(np.float32)

View File

@ -2,22 +2,22 @@ import unittest
import numpy as np
from tinygrad.lazy import Device
from tinygrad.tensor import Tensor
from tinygrad.helpers import getenv
from tinygrad.helpers import getenv, CI
def multidevice_test(fxn):
exclude_devices = getenv("EXCLUDE_DEVICES", "").split(",")
def ret(self):
for device in Device._buffers:
if device in ["DISK", "FAKE"]: continue
print(device)
if not CI: print(device)
if device in exclude_devices:
print(f"WARNING: {device} test is excluded")
if not CI: print(f"WARNING: {device} test is excluded")
continue
with self.subTest(device=device):
try:
Device[device]
except Exception:
print(f"WARNING: {device} test isn't running")
if not CI: print(f"WARNING: {device} test isn't running")
continue
fxn(self, device)
return ret