357 lines
12 KiB
Markdown
357 lines
12 KiB
Markdown
# wut?
|
|
`wut` --- What U Think? SatNOGS Observation AI.
|
|
|
|
# satnogs-wut
|
|
|
|
The goal of satnogs-wut is to have a script that will take an
|
|
observation ID and return an answer whether the observation is
|
|
"good", "bad", or "failed".
|
|
|
|
## Good Observation
|
|
<div>
|
|
<img src="satnogs-wut/media/branch/master/pics/waterfall-good.png" width="300"/>
|
|
</div>
|
|
|
|
## Bad Observation
|
|
<div>
|
|
<img src="satnogs-wut/media/branch/master/pics/waterfall-bad.png" width="300"/>
|
|
</div>
|
|
|
|
## Failed Observation
|
|
<div>
|
|
<img src="satnogs-wut/media/branch/master/pics/waterfall-failed.png" width="300"/>
|
|
</div>
|
|
|
|
## Observations
|
|
See also:
|
|
|
|
* https://wiki.satnogs.org/Operation
|
|
* https://wiki.satnogs.org/Rating_Observations
|
|
* https://wiki.satnogs.org/Taxonomy_of_Observations
|
|
* Sample observation: https://network.satnogs.org/observations/1456893/
|
|
|
|
# Machine Learning
|
|
The system at present is built upon the following:
|
|
|
|
* Debian Buster.
|
|
* Tensorflow 2.1 with built-in Keras.
|
|
* Jupyter Lab.
|
|
|
|
Learning/testing, results are ~~inaccurate~~ getting closer.
|
|
The main AI/ML development is now being done in Jupyter.
|
|
|
|
# Jupyter
|
|
There is a Jupyter Lab Notebook file.
|
|
This is producing real results at present, but has a long ways to go still...
|
|
|
|
* `wut-ml.ipynb` --- Machine learning Python script using Tensorflow and Keras in a Jupyter Notebook.
|
|
* `wut-predict.ipynb` --- Make prediction (rating) of observation, using `data/wut.h5`.
|
|
* `wut-train.ipynb` --- ML Training file saved to `data/wut.h5`.
|
|
|
|
# wut scripts
|
|
The following scripts are in the repo:
|
|
|
|
* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
|
|
* `wut-audio-archive` --- Downloads audio files from archive.org.
|
|
* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
|
|
* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
|
|
* `wut-compare-tx` --- Compare all the observations in `download/` with `wut` vettings using selected transmitter UUID.
|
|
* `wut-compare-txmode` --- Compare all the observations in `download/` with `wut` vettings using selected encoding.
|
|
* `wut-compare-txmode-csv` --- Compare all the observations in `download/` with `wut` vettings using selected encoding, CSV output.
|
|
* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
|
|
* `wut-dl-sort-tx` --- Populate `data/` dir with waterfalls from `download/` using selected transmitter UUID.
|
|
* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
|
|
* `wut-files` --- Tells you about what files you have in `downloads/` and `data/`.
|
|
* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
|
|
* `wut-ml-load` --- Machine learning Python script using Tensorflow and Keras, load `data/wut.h5`.
|
|
* `wut-ml-save` --- Machine learning Python script using Tensorflow and Keras, save `data/wut.h5`.
|
|
* `wut-obs` --- Download the JSON for an observation ID.
|
|
* `wut-ogg2wav` --- Convert `.ogg` files in `downloads/` to `.wav` files.
|
|
* `wut-review-staging` --- Review all images in `data/staging`.
|
|
* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
|
|
* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.
|
|
|
|
|
|
# Installation
|
|
Most of the scripts are simple shell scripts with few dependencies.
|
|
|
|
## Setup
|
|
The scripts use files that are ignored in the git repo.
|
|
So you need to create those directories:
|
|
|
|
```
|
|
mkdir -p download
|
|
mkdir -p data/train/good
|
|
mkdir -p data/train/bad
|
|
mkdir -p data/train/failed
|
|
mkdir -p data/val/good
|
|
mkdir -p data/val/bad
|
|
mkdir -p data/val/failed
|
|
mkdir -p data/staging
|
|
mkdir -p data/test/unvetted
|
|
```
|
|
|
|
## Debian Packages
|
|
You'll need `curl` and `jq`, both in Debian's repos.
|
|
|
|
```
|
|
apt update
|
|
apt install curl jq
|
|
```
|
|
|
|
## Install Tensorflow
|
|
For the machine learning scripts, like `wut-ml`, Tensorflow
|
|
needs to be installed.
|
|
As of version 2 of Tensorflow, Keras no longer needs to be
|
|
installed separately.
|
|
|
|
|
|
The verions of Tensorflow installed with `pip3` on Debian
|
|
Buster crashes. It is perhaps best to do a custom install,
|
|
best preferred build options, of the most preferred version.
|
|
At this point, the `remotes/origin/r2.1` branch is preferred.
|
|
|
|
|
|
To install Tensorflow:
|
|
|
|
* https://www.tensorflow.org/install/source
|
|
|
|
1. Install dependencies in Debian.
|
|
|
|
1. Install Bazel to build Tensorflow.
|
|
|
|
1. Build Tensorflow pip package.
|
|
|
|
1. Install Tensorflow from custom pip package.
|
|
|
|
|
|
```
|
|
# Install deps
|
|
apt update
|
|
apt install python3-pip
|
|
# Install bazel .deb from releases here:
|
|
firefox https://github.com/bazelbuild/bazel/releases
|
|
# Install Tensorflow
|
|
git clone tensorflow...
|
|
cd tensorflow
|
|
git checkout v2.1.0
|
|
bazel clean
|
|
# Get flags to pass:
|
|
grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
|
|
./configure
|
|
# Run Bazel to build pip package. Takes nearly 2 hours to build.
|
|
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
|
|
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
|
|
pip3 install --user /tmp/tensorflow_pkg/tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
|
|
```
|
|
|
|
### Tensorflow KVM Notes
|
|
Recent versions of Tensorflow can handle many more CPU build options
|
|
to optimize for speed, such as
|
|
[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions).
|
|
By default, Proxmox and likely other virtual machine systems pass
|
|
kvm/qemu "type=kvm" for CPU type. To use all possible CPU options
|
|
available on the bare metal server, use "type=host".
|
|
For more info about this in Proxmox, see
|
|
[CPU Type](https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_cpu)
|
|
If you don't have this enabled, CPU instructions will fail or
|
|
Tensorflow will run slower than it could.
|
|
|
|
### Tensor Configuration
|
|
```
|
|
$ ./configure
|
|
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
|
|
You have bazel 0.29.1 installed.
|
|
Please specify the location of python. [Default is /usr/bin/python3]:
|
|
|
|
|
|
Found possible Python library paths:
|
|
/usr/lib/python3/dist-packages
|
|
/usr/local/lib/python3.7/dist-packages
|
|
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]
|
|
|
|
Do you wish to build TensorFlow with XLA JIT support? [Y/n]:
|
|
XLA JIT support will be enabled for TensorFlow.
|
|
|
|
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
|
|
No OpenCL SYCL support will be enabled for TensorFlow.
|
|
|
|
Do you wish to build TensorFlow with ROCm support? [y/N]:
|
|
No ROCm support will be enabled for TensorFlow.
|
|
|
|
Do you wish to build TensorFlow with CUDA support? [y/N]:
|
|
No CUDA support will be enabled for TensorFlow.
|
|
|
|
Do you wish to download a fresh release of clang? (Experimental) [y/N]:
|
|
Clang will not be downloaded.
|
|
|
|
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: -march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx
|
|
|
|
|
|
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
|
|
Not configuring the WORKSPACE for Android builds.
|
|
|
|
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
|
|
--config=mkl # Build with MKL support.
|
|
--config=monolithic # Config for mostly static monolithic build.
|
|
--config=ngraph # Build with Intel nGraph support.
|
|
--config=numa # Build with NUMA support.
|
|
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
|
|
--config=v2 # Build TensorFlow 2.x instead of 1.x.
|
|
Preconfigured Bazel build configs to DISABLE default on features:
|
|
--config=noaws # Disable AWS S3 filesystem support.
|
|
--config=nogcp # Disable GCP support.
|
|
--config=nohdfs # Disable HDFS support.
|
|
--config=nonccl # Disable NVIDIA NCCL support.
|
|
Configuration finished
|
|
```
|
|
|
|
## KVM
|
|
Note, for KVM, pass cpu=host if host has "avx" in `/proc/cpuinfo`.
|
|
|
|
## Install Jupyter
|
|
Jupyter is a cute little web interface that makes Python programming
|
|
easy. It works well for machine learning because you can step through
|
|
just parts of the code, changing variables and immediately seeing
|
|
output in the web browser.
|
|
|
|
Probably installed like this:
|
|
|
|
```
|
|
pip3 install --user jupyterlab
|
|
# Also other good packages, maybe like:
|
|
pip3 install --user jupyter-tensorboard
|
|
pip3 list | grep jupyter
|
|
# returns:
|
|
jupyter 1.0.0
|
|
jupyter-client 5.3.4
|
|
jupyter-console 6.0.0
|
|
jupyter-core 4.6.1
|
|
jupyter-tensorboard 0.1.10
|
|
jupyterlab 1.2.4
|
|
jupyterlab-server 1.0.6
|
|
```
|
|
|
|
|
|
# Usage
|
|
The main purpose of the script is to evaluate an observation,
|
|
but to do that, it needs to build a corpus of observations to
|
|
learn from. So many of the scripts in this repo are just for
|
|
downloading and managing observations.
|
|
|
|
|
|
The following steps need to be performed:
|
|
|
|
1. Download waterfalls and JSON descriptions with `wut-water-range`.
|
|
These get put in the `downloads/[ID]/` directories.
|
|
|
|
1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").
|
|
Use `wut-dl-sort` script.
|
|
The script will sort them into their respective directories under:
|
|
* `data/train/good/`
|
|
* `data/train/bad/`
|
|
* `data/train/failed/`
|
|
* `data/val/good/`
|
|
* `data/val/bad/`
|
|
* `data/val/failed/`
|
|
|
|
1. Use machine learning script `wut-ml` to build a model based on
|
|
the files in the `data/train` and `data/val` directories.
|
|
|
|
1. Rate an observation using the `wut` script.
|
|
|
|
# ml.spacecruft.org
|
|
This server is processing the data and has directories available
|
|
to sync.
|
|
|
|
* https://ml.spacecruft.org/
|
|
|
|
## Data Caching Downloads
|
|
The scripts are designed to not download a waterfall or make a JSON request
|
|
for an observation it has already requested. The first time an observation
|
|
is requested, it is downloaded from the SatNOGS network to the `download/`
|
|
directory. That `download/` directory is the download cache.
|
|
|
|
|
|
The `data/` directory is just temporary files, mostly linked from the
|
|
`downloads/` directory. Files in the `data/` directory are deleted by many
|
|
scripts, so don't put anything you want to keep in there.
|
|
|
|
|
|
## Preprocessed Files
|
|
Files in the `preprocess/` directory have been preprocessed to be used
|
|
further in the pipeline. This contains `.wav` files that have been
|
|
decoded from `.ogg` files.
|
|
|
|
|
|
## SatNOGS Observation Data Mirror
|
|
The downloaded waterfalls are available below via `http` and `rsync`.
|
|
Use this instead of downloading from SatNOGS to save their bandwidth.
|
|
|
|
```
|
|
# Something like:
|
|
wget --mirror https://ml.spacecruft.org/download
|
|
# Or with rsync:
|
|
mkdir download
|
|
rsync -ultav rsync://ml.spacecruft.org/download/ download/
|
|
```
|
|
|
|
# TODO / Brainstorms
|
|
This is a first draft of how to do this. The actual machine learning
|
|
process hasn't been looked at at all, except to get it to generate
|
|
an answer. It has a long ways to go. There are also many ways to do
|
|
this besides using Tensorflow and Keras. Originally, I considered
|
|
using OpenCV. Ideas in no particular order below.
|
|
|
|
## General
|
|
General considerations.
|
|
|
|
* Use Open CV.
|
|
|
|
* Use something other than Tensorflow / Keras.
|
|
|
|
* Do mirror of `network.satnogs.org` and do API calls to it for data.
|
|
|
|
* Issues are now available here:
|
|
* https://spacecruft.org/spacecruft/satnogs-wut/issues
|
|
|
|
## Tensorflow / Keras
|
|
At present Tensorflow and Keras are used.
|
|
|
|
* Learn Keras / Tensorflow...
|
|
|
|
* What part of image is being evaluated?
|
|
|
|
* Re-evaluate each step.
|
|
|
|
* Right now the prediction output is just "good" or "bad", needs
|
|
"failed" too.
|
|
|
|
* Give confidence score in each prediction.
|
|
|
|
* Visualize what ML is looking at.
|
|
|
|
* Separate out good/bad/failed by satellite, transmitter, or encoding.
|
|
This way "good" isn't considering a "good" vetting to be a totally
|
|
different encoding. Right now, it is considering as good observations
|
|
that should be bad...
|
|
|
|
* If it has a low confidence, return "unknown" instead of "good" or "bad".
|
|
|
|
|
|
# Caveats
|
|
This is nearly the first machine learning script I've done,
|
|
I know little about radio and less about satellites,
|
|
and I'm not a programmer.
|
|
|
|
|
|
# Source License / Copying
|
|
Main repository is available here:
|
|
|
|
* https://spacecruft.org/spacecruft/satnogs-wut
|
|
|
|
|
|
License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.
|
|
|
|
Copyright (C) 2019, 2020, Jeff Moe
|