satnogs-wut/README.md

# satnogs-wut

The goal of satnogs-wut is to have a script that will take an
observation ID and return an answer whether the  observation is
"good", "bad", or "failed".

## Good Observation
![Good Observation](pics/waterfall-good.png)

## Bad Observation
![Bad Observation](pics/waterfall-bad.png)

## Failed Observation
![Failed Observation](pics/waterfall-failed.png)

# Machine Learning
The system at present is built upon the following:

* Debian
* Tensorflow
* Keras


Learning/testing, results are inaccurate.


# wut?
The following scripts are in the repo:

* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
* `wut-audio-archive` --- Downloads audio files from archive.org.
* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
* `wut-compare-tx` --- Compare all the observations in `download/` with `wut` vettings using selected transmitter UUID.
* `wut-compare-txmode` --- Compare all the observations in `download/` with `wut` vettings using selected encoding.
* `wut-compare-txmode-csv` --- Compare all the observations in `download/` with `wut` vettings using selected encoding, CSV output.
* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
* `wut-dl-sort-tx` --- Populate `data/` dir with waterfalls from `download/` using selected transmitter UUID.
* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
* `wut-files` --- Tells you about what files you have in `downloads/` and `data/`.
* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
* `wut-ml-load` --- Machine learning Python script using Tensorflow and Keras, load `data/wut.h5`.
* `wut-ml-save` --- Machine learning Python script using Tensorflow and Keras, save `data/wut.h5`.
* `wut-obs` --- Download the JSON for an observation ID.
* `wut-ogg2wav` --- Convert `.ogg` files in `downloads/` to `.wav` files.
* `wut-review-staging` --- Review all images in `data/staging`.
* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.


# Installation
Most of the scripts are simple shell scripts with few dependencies.

## Setup
The scripts use files that are ignored in the git repo.
So you need to create those directories:

```
mkdir -p download
mkdir -p data/train/good
mkdir -p data/train/bad
mkdir -p data/train/failed
mkdir -p data/val/good
mkdir -p data/val/bad
mkdir -p data/val/failed
mkdir -p data/staging
mkdir -p data/test/unvetted
```

## Debian Packages
You'll need `curl` and `jq`, both in Debian's repos.

```
apt update
apt install curl jq
```

## Machine Learning
For the machine learning scripts, like `wut-ml`, both Tensorflow
and Keras need to be installed. The versions of those in Debian
didn't work for me. IIRC, for Tensorflow I built a `pip` of
version 2.0.0 from git and installed that. I installed Keras
with `pip`. Something like:

```
# XXX These aren't the exact commands, need to check...
apt update
# deps...
apt install python3-pip ...
# Install bazel or whatever their build system is
# Install Tensorflow
git clone tensorflow...
cd tensorflow
./configure
# run some bazel command
dpkg -i /tmp/pkg_foo/*.deb
apt update
apt -f install
# Install Keras
pip3 install --user keras
# A million other commands....
```


# Usage
The main purpose of the script is to evaluate an observation,
but to do that, it needs to build a corpus of observations to
learn from. So many of the scripts in this repo are just for
downloading and managing observations.


The following steps need to be performed:

1. Download waterfalls and JSON descriptions with `wut-water-range`.
   These get put in the `downloads/[ID]/` directories.

1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").
   Use `wut-dl-sort` script.
   The script will sort them into their respective directories under:
	* `data/train/good/`
	* `data/train/bad/`
	* `data/train/failed/`
	* `data/val/good/`
	* `data/val/bad/`
	* `data/val/failed/`

1. Use machine learning script `wut-ml` to build a model based on
   the files in the `data/train` and `data/val` directories.

1. Rate an observation using the `wut` script.

# ml.spacecruft.org
This server is processing the data and has directories available
to sync.

* https://ml.spacecruft.org/

## Data Caching Downloads
The scripts are designed to not download a waterfall or make a JSON request
for an observation it has already requested. The first time an observation
is requested, it is downloaded from the SatNOGS network to the `download/`
directory. That `download/` directory is the download cache.


The `data/` directory is just temporary files, mostly linked from the
`downloads/` directory. Files in the `data/` directory are deleted by many
scripts, so don't put anything you want to keep in there.


## Preprocessed Files
Files in the `preprocess/` directory have been preprocessed to be used
further in the pipeline. This contains `.wav` files that have been
decoded from `.ogg` files.


## SatNOGS Observation Data Mirror
The downloaded waterfalls are available below via `http` and `rsync`.
Use this instead of downloading from SatNOGS to save their bandwidth.

```
# Something like:
wget --mirror https://ml.spacecruft.org/download
# Or with rsync:
mkdir download
rsync -ultav rsync://ml.spacecruft.org/download/ download/
```

# TODO / Brainstorms
This is a first draft of how to do this. The actual machine learning
process hasn't been looked at at all, except to get it to generate
an answer. It has a long ways to go. There are also many ways to do
this besides using Tensorflow and Keras. Originally, I considered
using OpenCV. Ideas in no particular order below.

## General
General considerations.

* Use Open CV.

* Use something other than Tensorflow / Keras.

* Do mirror of `network.satnogs.org` and do API calls to it for data.

* Issues are now available here:
  * https://spacecruft.org/spacecruft/satnogs-wut/issues

## Tensorflow / Keras
At present Tensorflow and Keras are used.

* Learn Keras / Tensorflow...

* What part of image is being evaluated?

* Re-evaluate each step.

* Right now the prediction output is just "good" or "bad", needs
  "failed" too.

* Give confidence score in each prediction.

* Visualize what ML is looking at.

* Separate out good/bad/failed by satellite, transmitter, or encoding.
  This way "good" isn't considering a "good" vetting to be a totally
  different encoding. Right now, it is considering as good observations
  that should be bad...

* If it has a low confidence, return "unknown" instead of "good" or "bad".


# Caveats
This is nearly the first machine learning script I've done,
I know little about radio and less about satellites,
and I'm not a programmer.


# Source License / Copying
Main repository is available here:

* https://spacecruft.org/spacecruft/satnogs-wut


License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.

Copyright (C) 2019, 2020, Jeff Moe