![]() |
||
---|---|---|
.ci | ||
.circleci | ||
.ctags.d | ||
.devcontainer | ||
.github | ||
.vscode | ||
android | ||
aten | ||
benchmarks | ||
binaries | ||
c10 | ||
caffe2 | ||
cmake | ||
docs | ||
functorch | ||
ios | ||
modules | ||
mypy_plugins | ||
scripts | ||
test | ||
third_party | ||
tools | ||
torch | ||
torchgen | ||
.bazelignore | ||
.bazelrc | ||
.bazelversion | ||
.buckconfig.oss | ||
.clang-format | ||
.clang-tidy | ||
.cmakelintrc | ||
.coveragerc | ||
.dockerignore | ||
.flake8 | ||
.gdbinit | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.isort.cfg | ||
.lintrunner.toml | ||
.lldbinit | ||
.python-version | ||
BUCK.oss | ||
BUILD.bazel | ||
CITATION.cff | ||
CMakeLists.txt | ||
CODEOWNERS | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
Dockerfile | ||
GLOSSARY.md | ||
LICENSE | ||
MANIFEST.in | ||
Makefile | ||
NOTICE | ||
README-upstream.md | ||
README.md | ||
RELEASE.md | ||
SECURITY.md | ||
WORKSPACE | ||
aten.bzl | ||
buckbuild.bzl | ||
build.bzl | ||
build_variables.bzl | ||
c2_defs.bzl | ||
c2_test_defs.bzl | ||
defs.bzl | ||
docker.Makefile | ||
mypy-nofollow.ini | ||
mypy-strict.ini | ||
mypy.ini | ||
pt_ops.bzl | ||
pt_template_srcs.bzl | ||
pyproject.toml | ||
pytest.ini | ||
requirements-flake8.txt | ||
requirements.txt | ||
setup.py | ||
ubsan.supp | ||
ufunc_defs.bzl | ||
version.txt |
README.md
Pytorch
A forklet of Pytorch for my own quirky needs.
- Build notes and scripts for different machines I admin.
- CPU builds.
- Meh ROCm AMD GPU builds.
- Proprietary Nvidia builds.
- Flailing at getting larger GPUs + Pytorch on ppc64le going.
- Kludges to workaround disabled ipv6.
Install
thusly, on Debian stable (bookworm/12).
Dependencies
Perhaps this and more:
OS
sudo apt install git build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl ccache \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev gcc-11 g++-11 libblis64-dev ninja-build \
libblis-dev libfftw3-dev libmpfr-dev protobuf-compiler protobuf-c-compiler \
libasmjit-dev python3-virtualenv python3-pip
Python
At present, seems latest Python that works happily with most Pytorch applications is Python 3.10.
Use pyenv to manage versions, install something like:
# :)
curl https://pyenv.run | bash
Add to ~/.bashrc
, then re-source it (logout/in or whatever):
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
Get Python version 3.10:
pyenv install 3.10
Perhaps other versions, ala:
pyenv install 3.9
pyenv install 3.11.6
pyenv install 3.12
pyenv install 3.13-dev
Setup Pip Download Cache
So as to not download files from the Internet multiple times from each machine in the cluster, a Pip download cache can be set up thusly, assuming server IP 192.168.100.101, username debian:
mkdir -p ~/devel/devpi
cd ~/devel/devpi
virtualenv env
source env/bin/activate
pip install -U setuptools wheel pip
pip install devpi-server devpi-web
sudo mkdir /srv/devpi
sudo chown debian:debian /srv/devpi
devpi-init \
--serverdir /srv/devpi
devpi-gen-config \
--host=0.0.0.0 \
--port 4040 \
--serverdir /srv/devpi \
--absolute-urls
sudo apt install nginx
sudo cp ~debian/devel/devpi/gen-config/nginx-devpi.conf /etc/nginx/sites-available/
cd /etc/nginx/sites-enabled
sudo ln -s ../sites-available/nginx-devpi.conf .
sudo apt install supervisor
sudo cp ~debian/devel/devpi/gen-config/supervisor-devpi.conf /etc/supervisor/conf.d/
crontab -e
@reboot /usr/local/sbin/supervisord -c /home/debian/etc/supervisor-devpi.conf
supervisord -c gen-config/supervisord.conf
sudo reboot
devpi use http://192.168.100.101:4040
devpi login root --password ''
devpi user -m root password=FOO
devpi user -l
devpi logoff
devpi user -c debian password=BAR email=devpi@localhost
devpi login debian --password=BAR
devpi index -c dev bases=root/pypi
devpi use debian/dev
devpi install pytest
Add Pip Download Cache
Add this to clients to use the cache:
mkdir -p ~/.config/pip
cat > ~/.config/pip/pip.conf <<EOF
[global]
trusted-host = 192.168.100.101
index-url = http://192.168.100.101:4040/root/pypi/+simple/
[search]
index = http://192.168.100.101:4040/root/pypi/
EOF
Other caches
Also set up ccache
cluster with Redis for remote.
Note, Redis needs systemctl edit redis-server
to set timeouts
to inifinity or it may just keep restarting itself. Thx systemd.
And npm cache (verdaccio), rust cache (panamax),
apt cache (apt-cacher-ng). And sccache
for rust
compiles.
Get Deepcrayon Pytorch Repo
git clone https://spacecruft.org/deepcrayon/pytorch
cd pytorch/
Compile
Now actually build.
git make more
From System76, but came with A5000. Has non-free Debian nvidia junk installed.
This rebuilds from scratch.
export PYTHONVER=3.10
export TORCHVER=deepcrayon-v2.1
export GCCVER=11
export CMAKE_C_COMPILER=/usr/lib/ccache/gcc-${GCCVER}
export CMAKE_CXX_COMPILER=/usr/lib/ccache/g++-${GCCVER}
cd ~/devel/deepcrayon/pytorch # or wherever repo is
source deactivate
rm -rf venv
rm -rf build
mkdir -p build
git checkout deepcrayon-v2.1
git clean -ff
git reset --hard HEAD
git clean -ff
git pull
git submodule update --init --recursive
virtualenv -p ${PYTHONVER} venv
source venv/bin/activate
pip install -U setuptools wheel pip
pip install -r requirements.txt
# huh
cd third_party/python-peachpy
python setup.py generate
cd ../..
# will barf, but sets up some dirs:
python setup.py build --cmake-only
# Use `ccmake` instead of `cmake` if you want to configure further.
#
# For amd64 CPU:
cmake build -DBLAS=BLIS -DTP_BUILD_PYTHON=ON
# For amd64 nvidia A6000 GPU (`sm_86`) XXX NON-FREE:
cmake build -DCUDAToolkit_INCLUDE_DIR=/usr/include -DBLAS=BLIS \
-DCUDA_SDK_ROOT_DIR=/usr -DENABLE_CUDA=ON -DTP_BUILD_PYTHON=ON
# For ppc64le CPU:
cmake build -DUSE_NCCL=OFF -DBLAS=BLIS -DTP_BUILD_PYTHON=ON \
-DUSE_FBGEMM=OFF
# For ppc64le testing nvidia A5000 GPU (`sm_86`):
cmake build -DCUDAToolkit_INCLUDE_DIR=/usr/include -DBLAS=BLIS \
-DCUDA_SDK_ROOT_DIR=/usr -DENABLE_CUDA=ON -DTP_BUILD_PYTHON=ON \
-DUSE_FBGEMM=OFF
# Make a wheel:
python setup.py bdist_wheel
# or:
python setup.py install
Also consider, such as:
# -DNNL_GPU_VENDOR -DUSE_NATIVE_ARCH -DBUILD_CAFFE2=ON
# -DUSE_OPENCL=ON -DUSE_REDIS=ON -DUSE_ROCKSDB=ON -DUSE_ZMQ=ON
# -DUSE_LMDB=ON -DUSE_LMDB=ON -DUSE_GLOG=ON
# -DUSE_FFMPEG=ON -DCUPTI_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu
# -DUSE_NVRTC=ON -DUSE_OPENCV=ON -DUSE_ZSTD=ON
# -DCMAKE_CUDA_ARCHITECTURES=native
The resulting binary will be ala:
./dist/torch-2.1.0a0+git83f7fe3-cp310-cp310-linux_x86_64.whl
Upstream
Main upstream:
See also: README-upstream.md
.
Disclaimer
I am not a programmer.
Copyright
Unofficial project, not related to upstream projects.
Upstream sources under their respective copyrights.
License
MIT.
Copyright © 2023, Jeff Moe.