satnogs-wut/README-distributed.md

80 lines
1.8 KiB
Markdown
Raw Normal View History

2020-01-16 14:36:52 -07:00
# Distributed Computing
HOWTO Set up and run Tensorflow on multiple nodes.
This is to this particular configuration.
# Software
Main software in use:
* Debian
* Proxmox
* Ceph
* Python 3
* Tensorflow
* Jupyter
* `clusterssh`
# Installation
Major steps.
1. Install Proxmox on bare metal.
1. Clone Debian KVM Nodes.
1. Set up nodes.
1. Install Tensorflow.
1. Set up Ceph.
2020-01-16 16:00:25 -07:00
## Proxmox
Setting up Proxmox is outside the scope of this document.
All you really need is some virtual machines, however
they are created.
* https://www.proxmox.com/en/proxmox-ve
2020-01-17 17:45:14 -07:00
## Set up nodes
```
# On main workstation or node where you built tensorflow:
NODES="ml1 ml2 ml3 ml4 ml5"
for i in $NODES
do scp -p tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl $i:
done
```
```
# On worker nodes:
sudo apt update
2020-01-18 17:14:49 -07:00
sudo apt install python3-pip sshfs
2020-01-17 17:45:14 -07:00
# XXX deps...
pip3 install --upgrade setuptools
pip3 install --user tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
2020-01-17 18:23:37 -07:00
pip3 install --user simplejson
2020-01-18 17:14:49 -07:00
pip3 install --user pillow
2020-01-17 17:45:14 -07:00
```
2020-01-20 13:47:39 -07:00
Another way, using upstream tensorflow packages.
You also have to install the latest `pip` from `pip`,
or you'll get `tensorflow 1.x`.
```
pip3 install pip
pip3 install --upgrade pip
# make sure new `pip3` at `~/.local/bin/pip3` is in front in `$PATH`.
# install tensorflow
pip3 install --user tensorflow
# If that fails due to the PATH, run like:
~/.local/bin/pip3 install --user tensorflow
pip3 list | grep tensorflow
# There's a bunch of tests that can be run, such as:
python3 ~/devel/tensorflow/tensorflow/tensorflow/python/distribute/multi_worker_continuous_run_test.py
```
2020-01-16 14:36:52 -07:00
# Usage
`top`
2020-01-21 11:45:31 -07:00
# Meh
```
# for running some tensorflow tests:
pip3 install --user portpicker
# For other examples/tests:
#pip3 install --user opencv-python
apt install python3-opencv
pip3 install --user pandas
```