satnogs-wut/README-distributed.md

1015 B

Distributed Computing

HOWTO Set up and run Tensorflow on multiple nodes. This is to this particular configuration.

Software

Main software in use:

  • Debian
  • Proxmox
  • Ceph
  • Python 3
  • Tensorflow
  • Jupyter
  • clusterssh

Installation

Major steps.

  1. Install Proxmox on bare metal.
  2. Clone Debian KVM Nodes.
  3. Set up nodes.
  4. Install Tensorflow.
  5. Set up Ceph.

Proxmox

Setting up Proxmox is outside the scope of this document. All you really need is some virtual machines, however they are created.

Set up nodes

# On main workstation or node where you built tensorflow:
NODES="ml1 ml2 ml3 ml4 ml5"
for i in $NODES
	do scp -p tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl $i:
done
# On worker nodes:
sudo apt update
sudo apt install python3-pip sshfs
# XXX deps...
pip3 install --upgrade setuptools
pip3 install --user tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
pip3 install --user simplejson
pip3 install --user pillow

Usage

top