1015 B
1015 B
Distributed Computing
HOWTO Set up and run Tensorflow on multiple nodes. This is to this particular configuration.
Software
Main software in use:
- Debian
- Proxmox
- Ceph
- Python 3
- Tensorflow
- Jupyter
clusterssh
Installation
Major steps.
- Install Proxmox on bare metal.
- Clone Debian KVM Nodes.
- Set up nodes.
- Install Tensorflow.
- Set up Ceph.
Proxmox
Setting up Proxmox is outside the scope of this document. All you really need is some virtual machines, however they are created.
Set up nodes
# On main workstation or node where you built tensorflow:
NODES="ml1 ml2 ml3 ml4 ml5"
for i in $NODES
do scp -p tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl $i:
done
# On worker nodes:
sudo apt update
sudo apt install python3-pip sshfs
# XXX deps...
pip3 install --upgrade setuptools
pip3 install --user tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
pip3 install --user simplejson
pip3 install --user pillow
Usage
top