# Distributed Computing HOWTO Set up and run Tensorflow on multiple nodes. This is to this particular configuration. # Software Main software in use: * Debian * Proxmox * Ceph * Python 3 * Tensorflow * Jupyter * `clusterssh` # Installation Major steps. 1. Install Proxmox on bare metal. 1. Clone Debian KVM Nodes. 1. Set up nodes. 1. Install Tensorflow. 1. Set up Ceph. ## Proxmox Setting up Proxmox is outside the scope of this document. All you really need is some virtual machines, however they are created. * https://www.proxmox.com/en/proxmox-ve ## Set up nodes ``` # On main workstation or node where you built tensorflow: NODES="ml1 ml2 ml3 ml4 ml5" for i in $NODES do scp -p tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl $i: done ``` ``` # On worker nodes: sudo apt update sudo apt install python3-pip sshfs # XXX deps... pip3 install --upgrade setuptools pip3 install --user tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl pip3 install --user simplejson pip3 install --user pillow ``` Another way, using upstream tensorflow packages. You also have to install the latest `pip` from `pip`, or you'll get `tensorflow 1.x`. ``` pip3 install pip pip3 install --upgrade pip # make sure new `pip3` at `~/.local/bin/pip3` is in front in `$PATH`. # install tensorflow pip3 install --user tensorflow # If that fails due to the PATH, run like: ~/.local/bin/pip3 install --user tensorflow pip3 list | grep tensorflow # There's a bunch of tests that can be run, such as: python3 ~/devel/tensorflow/tensorflow/tensorflow/python/distribute/multi_worker_continuous_run_test.py ``` # Usage `top` # Meh ``` # for running some tensorflow tests: pip3 install --user portpicker # For other examples/tests: #pip3 install --user opencv-python apt install python3-opencv pip3 install --user pandas ```