# Distributed Computing HOWTO Set up and run Tensorflow on multiple nodes. This is to this particular configuration. # Software Main software in use: * Debian * Proxmox * Ceph * Python 3 * Tensorflow * Jupyter * `clusterssh` # Installation Major steps. 1. Install Proxmox on bare metal. 1. Clone Debian KVM Nodes. 1. Set up nodes. 1. Install Tensorflow. 1. Set up Ceph. ## Proxmox Setting up Proxmox is outside the scope of this document. All you really need is some virtual machines, however they are created. * https://www.proxmox.com/en/proxmox-ve ## Set up nodes ``` # On main workstation or node where you built tensorflow: NODES="ml1 ml2 ml3 ml4 ml5" ``` ``` # On worker nodes: sudo apt update sudo apt install python3-pip sshfs jq pip3 install --upgrade --user pip # make sure new `pip3` at `~/.local/bin/pip3` is in front in `$PATH`. pip3 install --upgrade --user -r requirements-node.txt # If you have cloned the tensorflow repo, test with: #python3 ~/devel/tensorflow/tensorflow/tensorflow/python/distribute/multi_worker_continuous_run_test.py ```