pytorch/README.md

258 lines
5.6 KiB
Markdown

# Pytorch
A forklet of Pytorch for my own quirky needs.
* Build notes and scripts for different machines I admin.
* CPU builds.
* Meh ROCm AMD GPU builds.
* Proprietary Nvidia builds.
* Flailing at getting larger GPUs + Pytorch on ppc64le going.
* Kludges to workaround disabled ipv6.
# Install
thusly, on Debian stable (bookworm/12).
## Dependencies
Perhaps this and more:
### OS
```
sudo apt install git build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl ccache \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev gcc-11 g++-11 libblis64-dev ninja-build \
libblis-dev libfftw3-dev libmpfr-dev protobuf-compiler protobuf-c-compiler \
libasmjit-dev python3-virtualenv python3-pip
```
### Python
At present, seems latest Python that works happily with most Pytorch
applications is Python 3.10.
Use pyenv to manage versions, install something like:
```
# :)
curl https://pyenv.run | bash
```
Add to `~/.bashrc`, then re-source it (logout/in or whatever):
```
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
```
Get Python version 3.10:
```
pyenv install 3.10
```
Perhaps other versions, ala:
```
pyenv install 3.9
pyenv install 3.11.6
pyenv install 3.12
pyenv install 3.13-dev
```
#### Setup Pip Download Cache
So as to not download files from the Internet multiple times from
each machine in the cluster, a Pip download cache can be set up
thusly, assuming server IP 192.168.100.101, username debian:
```
mkdir -p ~/devel/devpi
cd ~/devel/devpi
virtualenv env
source env/bin/activate
pip install -U setuptools wheel pip
pip install devpi-server devpi-web
sudo mkdir /srv/devpi
sudo chown debian:debian /srv/devpi
devpi-init \
--serverdir /srv/devpi
devpi-gen-config \
--host=0.0.0.0 \
--port 4040 \
--serverdir /srv/devpi \
--absolute-urls
sudo apt install nginx
sudo cp ~debian/devel/devpi/gen-config/nginx-devpi.conf /etc/nginx/sites-available/
cd /etc/nginx/sites-enabled
sudo ln -s ../sites-available/nginx-devpi.conf .
sudo apt install supervisor
sudo cp ~debian/devel/devpi/gen-config/supervisor-devpi.conf /etc/supervisor/conf.d/
crontab -e
@reboot /usr/local/sbin/supervisord -c /home/debian/etc/supervisor-devpi.conf
supervisord -c gen-config/supervisord.conf
sudo reboot
devpi use http://192.168.100.101:4040
devpi login root --password ''
devpi user -m root password=FOO
devpi user -l
devpi logoff
devpi user -c debian password=BAR email=devpi@localhost
devpi login debian --password=BAR
devpi index -c dev bases=root/pypi
devpi use debian/dev
devpi install pytest
```
#### Add Pip Download Cache
Add this to clients to use the cache:
```
mkdir -p ~/.config/pip
cat > ~/.config/pip/pip.conf <<EOF
[global]
trusted-host = 192.168.100.101
index-url = http://192.168.100.101:4040/root/pypi/+simple/
[search]
index = http://192.168.100.101:4040/root/pypi/
EOF
```
## Other caches
Also set up `ccache` cluster with Redis for remote.
Note, Redis needs `systemctl edit redis-server` to set timeouts
to inifinity or it may just keep restarting itself. Thx systemd.
And npm cache (verdaccio), rust cache (panamax),
apt cache (apt-cacher-ng). And `sccache` for rust
compiles.
## Get Deepcrayon Pytorch Repo
```
git clone https://spacecruft.org/deepcrayon/pytorch
cd pytorch/
```
## Compile
Now actually build.
### git make more
From System76, but came with A5000.
Has non-free Debian nvidia junk installed.
This rebuilds from scratch.
```
export PYTHONVER=3.10
export TORCHVER=deepcrayon-v2.1
export GCCVER=11
export CMAKE_C_COMPILER=/usr/lib/ccache/gcc-${GCCVER}
export CMAKE_CXX_COMPILER=/usr/lib/ccache/g++-${GCCVER}
cd ~/devel/deepcrayon/pytorch # or wherever repo is
source deactivate
rm -rf venv
rm -rf build
mkdir -p build
git checkout deepcrayon-v2.1
git clean -ff
git reset --hard HEAD
git clean -ff
git pull
git submodule update --init --recursive
virtualenv -p ${PYTHONVER} venv
source venv/bin/activate
pip install -U setuptools wheel pip
pip install -r requirements.txt
# huh
cd third_party/python-peachpy
python setup.py generate
cd ../..
# will barf, but sets up some dirs:
python setup.py build --cmake-only
# Use `ccmake` instead of `cmake` if you want to configure further.
#
# For amd64 CPU:
cmake build -DBLAS=BLIS -DTP_BUILD_PYTHON=ON
# For amd64 nvidia A6000 GPU (`sm_86`) XXX NON-FREE:
cmake build -DCUDAToolkit_INCLUDE_DIR=/usr/include -DBLAS=BLIS \
-DCUDA_SDK_ROOT_DIR=/usr -DENABLE_CUDA=ON -DTP_BUILD_PYTHON=ON
# For ppc64le CPU:
cmake build -DUSE_NCCL=OFF -DBLAS=BLIS -DTP_BUILD_PYTHON=ON \
-DUSE_FBGEMM=OFF
# For ppc64le testing nvidia A5000 GPU (`sm_86`):
cmake build -DCUDAToolkit_INCLUDE_DIR=/usr/include -DBLAS=BLIS \
-DCUDA_SDK_ROOT_DIR=/usr -DENABLE_CUDA=ON -DTP_BUILD_PYTHON=ON \
-DUSE_FBGEMM=OFF
# Make a wheel:
python setup.py bdist_wheel
# or:
python setup.py install
```
Also consider, such as:
```
# -DNNL_GPU_VENDOR -DUSE_NATIVE_ARCH -DBUILD_CAFFE2=ON
# -DUSE_OPENCL=ON -DUSE_REDIS=ON -DUSE_ROCKSDB=ON -DUSE_ZMQ=ON
# -DUSE_LMDB=ON -DUSE_LMDB=ON -DUSE_GLOG=ON
# -DUSE_FFMPEG=ON -DCUPTI_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu
# -DUSE_NVRTC=ON -DUSE_OPENCV=ON -DUSE_ZSTD=ON
# -DCMAKE_CUDA_ARCHITECTURES=native
```
The resulting binary will be ala:
```
./dist/torch-2.1.0a0+git83f7fe3-cp310-cp310-linux_x86_64.whl
```
# Upstream
Main upstream:
* https://github.com/pytorch/pytorch
See also: `README-upstream.md`.
# Disclaimer
I am not a programmer.
# Copyright
Unofficial project, not related to upstream projects.
Upstream sources under their respective copyrights.
# License
MIT.
*Copyright &copy; 2023, Jeff Moe.*