55 lines
1015 B
Markdown
55 lines
1015 B
Markdown
# Distributed Computing
|
|
HOWTO Set up and run Tensorflow on multiple nodes.
|
|
This is to this particular configuration.
|
|
|
|
# Software
|
|
Main software in use:
|
|
|
|
* Debian
|
|
* Proxmox
|
|
* Ceph
|
|
* Python 3
|
|
* Tensorflow
|
|
* Jupyter
|
|
* `clusterssh`
|
|
|
|
# Installation
|
|
Major steps.
|
|
|
|
1. Install Proxmox on bare metal.
|
|
1. Clone Debian KVM Nodes.
|
|
1. Set up nodes.
|
|
1. Install Tensorflow.
|
|
1. Set up Ceph.
|
|
|
|
## Proxmox
|
|
Setting up Proxmox is outside the scope of this document.
|
|
All you really need is some virtual machines, however
|
|
they are created.
|
|
|
|
* https://www.proxmox.com/en/proxmox-ve
|
|
|
|
## Set up nodes
|
|
```
|
|
# On main workstation or node where you built tensorflow:
|
|
NODES="ml1 ml2 ml3 ml4 ml5"
|
|
for i in $NODES
|
|
do scp -p tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl $i:
|
|
done
|
|
```
|
|
|
|
```
|
|
# On worker nodes:
|
|
sudo apt update
|
|
sudo apt install python3-pip sshfs
|
|
# XXX deps...
|
|
pip3 install --upgrade setuptools
|
|
pip3 install --user tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
|
|
pip3 install --user simplejson
|
|
pip3 install --user pillow
|
|
```
|
|
|
|
# Usage
|
|
`top`
|
|
|