satnogs-wut/README.md

# satnogs-wut

The goal of satnogs-wut is to have a script that will take an
observation ID and return an answer whether the  observation is
"good", "bad", or "failed".

## Good Observation
![Good Observation](pics/waterfall-good.png)

## Bad Observation
![Bad Observation](pics/waterfall-bad.png)

## Failed Observation
![Failed Observation](pics/waterfall-failed.png)

# Machine Learning
The system at present is built upon the following:

* Debian
* Tensorflow
* Keras


Learning/testing, results are inaccurate.


# wut?
The following scripts are in the repo:

* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
* `wut-obs` --- Download the JSON for an observation ID.
* `wut-review-staging` --- Review all images in `data/staging`.
* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.


# Installation
Most of the scripts are simple shell scripts with few dependencies.

## Setup
The scripts use files that are ignored in the git repo.
So you need to create those directories:

```
mkdir -p download
mkdir -p data/train/good
mkdir -p data/train/bad
mkdir -p data/train/failed
mkdir -p data/val/good
mkdir -p data/val/bad
mkdir -p data/val/failed
mkdir -p data/staging
mkdir -p data/test/unvetted
```

## Debian Packages
You'll need `curl` and `jq`, both in Debian's repos.

```
apt update
apt install curl jq
```

## Machine Learning
For the machine learning scripts, like `wut-ml`, both Tensorflow
and Keras need to be installed. The versions of those in Debian
didn't work for me. IIRC, for Tensorflow I built a `pip` of
version 2.0.0 from git and installed that. I installed Keras
with `pip`. Something like:

```
# XXX These aren't the exact commands, need to check...
apt update
# deps...
apt install python3-pip ...
# Install bazel or whatever their build system is
# Install Tensorflow
git clone tensorflow...
cd tensorflow
./configure
# run some bazel command
dpkg -i /tmp/pkg_foo/*.deb
apt update
apt -f install
# Install Keras
pip3 install --user keras
# A million other commands....
```


# Usage
The main purpose of the script is to evaluate an observation,
but to do that, it needs to build a corpus of observations to
learn from. So many of the scripts in this repo are just for
downloading and managing observations.


The following steps need to be performed:

1. Download waterfalls and JSON descriptions with `wut-water-range`.
   These get put in the `downloads/[ID]/` directories.

1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").
   Use `wut-dl-sort` script.
   The script will sort them into their respective directories under:
	* `data/train/good/`
	* `data/train/bad/`
	* `data/train/failed/`
	* `data/val/good/`
	* `data/val/bad/`
	* `data/val/failed/`

1. Use machine learning script `wut-ml` to build a model based on
   the files in the `data/train` and `data/val` directories.

1. Rate an observation using the `wut` script.

# Data Caching Downloads
The scripts are designed to not download a waterfall or make a JSON request
for an observation it has already requested. The first time an observation
is requested, it is downloaded from the SatNOGS network to the `download`
directory. That `download` directory is the download cache. 


The `data` directory is just temporary files,mostly linked from the
`downloads` directory. Files in the `data` directory are deleted by many
scripts, so don't put anything you want to keep in there.


# SatNOGS Observation Data Mirror
The downloaded waterfalls are available below via `http` and `rsync`.
Use this instead of downloading from SatNOGS to save their bandwidth.

```
# Something like:
wget --mirror https://ml.spacecruft.org/download
# Or with rsync:
mkdir download
rsync -ultav rsync://ml.spacecruft.org/download/ download/
```

# TODO / Brainstorms
This is a first draft of how to do this. The actual machine learning
process hasn't been looked at at all, except to get it to generate
an answer. It has a long ways to go. There are also many ways to do
this besides using Tensorflow and Keras. Originally, I considered
using OpenCV. Ideas in no particular order below.

## General
General considerations.

* Use Open CV.

* Use something other than Tensorflow / Keras.

* Do mirror of `network.satnogs.org` and do API calls to it for data.

* Issues are now available here:
  * https://spacecruft.org/spacecruft/satnogs-wut/issues

## Tensorflow / Keras
At present Tensorflow and Keras are used.

* Learn Keras / Tensorflow...

* What part of image is being evaluated?

* Re-evaluate each step.

* Right now the prediction output is just "good" or "bad", needs
  "failed" too.

* Give confidence score in each prediction.

* Visualize what ML is looking at.

* Separate out good/bad/failed by satellite, transmitter, or encoding.
  This way "good" isn't considering a "good" vetting to be a totally
  different encoding. Right now, it is considering as good observations
  that should be bad...

* If it has a low confidence, return "unknown" instead of "good" or "bad".


# Caveats
This is nearly the first machine learning script I've done,
I know little about radio and less about satellites,
and I'm not a programmer.


# Source License / Copying
Main repository is available here:

* https://spacecruft.org/spacecruft/satnogs-wut


License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.

Copyright (C) 2019, 2020, Jeff Moe
Initial commit 2020-01-01 23:12:46 -07:00			`# satnogs-wut`

script info 2020-01-02 16:44:03 -07:00			`The goal of satnogs-wut is to have a script that will take an`
			`observation ID and return an answer whether the observation is`
			`"good", "bad", or "failed".`

gud 2020-01-02 16:52:23 -07:00			`## Good Observation`
examples 2020-01-02 16:51:29 -07:00			`![Good Observation](pics/waterfall-good.png)`
gud 2020-01-02 16:52:23 -07:00
			`## Bad Observation`
examples 2020-01-02 16:51:29 -07:00			`![Bad Observation](pics/waterfall-bad.png)`
gud 2020-01-02 16:52:23 -07:00
			`## Failed Observation`
examples 2020-01-02 16:51:29 -07:00			`![Failed Observation](pics/waterfall-failed.png)`

script info 2020-01-02 16:44:03 -07:00			`# Machine Learning`
tweaklets 2020-01-02 22:10:19 -07:00			`The system at present is built upon the following:`
script info 2020-01-02 16:44:03 -07:00
			`* Debian`
stub 2020-01-01 23:18:12 -07:00			`* Tensorflow`
			`* Keras`


tweaklets 2020-01-02 22:10:19 -07:00			`Learning/testing, results are inaccurate.`
stub 2020-01-01 23:18:12 -07:00

script info 2020-01-02 16:44:03 -07:00			`# wut?`
			`The following scripts are in the repo:`

			* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
tweaklets 2020-01-02 22:10:19 -07:00			* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
wut-compare scripts 2020-01-02 20:41:56 -07:00			* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
wut-dl-sort script to organize downloads into data/ 2020-01-02 20:14:36 -07:00			* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
wut-dl-sort-txmode 2020-01-03 14:34:22 -07:00			* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
wut-dl-sort script to organize downloads into data/ 2020-01-02 20:14:36 -07:00			* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
rename 2020-01-02 19:13:58 -07:00			* `wut-obs` --- Download the JSON for an observation ID.
wut-dl-sort script to organize downloads into data/ 2020-01-02 20:14:36 -07:00			* `wut-review-staging` --- Review all images in `data/staging`.
rename 2020-01-02 19:13:58 -07:00			* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
			* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.
script info 2020-01-02 16:44:03 -07:00

installation stub 2020-01-02 17:30:22 -07:00			`# Installation`
			`Most of the scripts are simple shell scripts with few dependencies.`

			`## Setup`
			`The scripts use files that are ignored in the git repo.`
			`So you need to create those directories:`

			```
			`mkdir -p download`
			`mkdir -p data/train/good`
			`mkdir -p data/train/bad`
			`mkdir -p data/train/failed`
rename validation/ directory to val/ 2020-01-02 19:28:26 -07:00			`mkdir -p data/val/good`
			`mkdir -p data/val/bad`
			`mkdir -p data/val/failed`
installation stub 2020-01-02 17:30:22 -07:00			`mkdir -p data/staging`
			`mkdir -p data/test/unvetted`
			```

			`## Debian Packages`
			You'll need `curl` and `jq`, both in Debian's repos.

			```
			`apt update`
			`apt install curl jq`
			```

			`## Machine Learning`
			For the machine learning scripts, like `wut-ml`, both Tensorflow
			`and Keras need to be installed. The versions of those in Debian`
			didn't work for me. IIRC, for Tensorflow I built a `pip` of
			`version 2.0.0 from git and installed that. I installed Keras`
tweaklets 2020-01-02 22:10:19 -07:00			with `pip`. Something like:
installation stub 2020-01-02 17:30:22 -07:00
			```
			`# XXX These aren't the exact commands, need to check...`
tweaklets 2020-01-02 22:10:19 -07:00			`apt update`
			`# deps...`
			`apt install python3-pip ...`
installation stub 2020-01-02 17:30:22 -07:00			`# Install bazel or whatever their build system is`
			`# Install Tensorflow`
			`git clone tensorflow...`
			`cd tensorflow`
			`./configure`
			`# run some bazel command`
			`dpkg -i /tmp/pkg_foo/*.deb`
			`apt update`
			`apt -f install`
			`# Install Keras`
			`pip3 install --user keras`
			`# A million other commands....`
			```


usage stub 2020-01-02 17:11:16 -07:00			`# Usage`
			`The main purpose of the script is to evaluate an observation,`
			`but to do that, it needs to build a corpus of observations to`
			`learn from. So many of the scripts in this repo are just for`
			`downloading and managing observations.`


			`The following steps need to be performed:`

tweaklets 2020-01-02 22:10:19 -07:00			1. Download waterfalls and JSON descriptions with `wut-water-range`.
usage stub 2020-01-02 17:11:16 -07:00			These get put in the `downloads/[ID]/` directories.

			`1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").`
wut-dl-sort script to organize downloads into data/ 2020-01-02 20:14:36 -07:00			Use `wut-dl-sort` script.
consider 2020-01-03 13:58:35 -07:00			`The script will sort them into their respective directories under:`
formatting 2020-01-02 17:16:11 -07:00			* `data/train/good/`
			* `data/train/bad/`
			* `data/train/failed/`
rename validation/ directory to val/ 2020-01-02 19:28:26 -07:00			* `data/val/good/`
			* `data/val/bad/`
			* `data/val/failed/`
usage stub 2020-01-02 17:11:16 -07:00
			1. Use machine learning script `wut-ml` to build a model based on
rename validation/ directory to val/ 2020-01-02 19:28:26 -07:00			the files in the `data/train` and `data/val` directories.
usage stub 2020-01-02 17:11:16 -07:00
			1. Rate an observation using the `wut` script.

caching, downloads ... 2020-01-02 22:51:51 -07:00			`# Data Caching Downloads`
			`The scripts are designed to not download a waterfall or make a JSON request`
			`for an observation it has already requested. The first time an observation`
			is requested, it is downloaded from the SatNOGS network to the `download`
			directory. That `download` directory is the download cache.


			The `data` directory is just temporary files,mostly linked from the
			`downloads` directory. Files in the `data` directory are deleted by many
			`scripts, so don't put anything you want to keep in there.`
usage stub 2020-01-02 17:11:16 -07:00
add data mirror 2020-01-03 13:37:48 -07:00
			`# SatNOGS Observation Data Mirror`
			The downloaded waterfalls are available below via `http` and `rsync`.
			`Use this instead of downloading from SatNOGS to save their bandwidth.`

			```
			`# Something like:`
			`wget --mirror https://ml.spacecruft.org/download`
			`# Or with rsync:`
			`mkdir download`
			`rsync -ultav rsync://ml.spacecruft.org/download/ download/`
			```

consider 2020-01-03 13:58:35 -07:00			`# TODO / Brainstorms`
			`This is a first draft of how to do this. The actual machine learning`
			`process hasn't been looked at at all, except to get it to generate`
			`an answer. It has a long ways to go. There are also many ways to do`
			`this besides using Tensorflow and Keras. Originally, I considered`
			`using OpenCV. Ideas in no particular order below.`

			`## General`
			`General considerations.`

			`* Use Open CV.`

			`* Use something other than Tensorflow / Keras.`

			* Do mirror of `network.satnogs.org` and do API calls to it for data.

issues 2020-01-03 14:05:15 -07:00			`* Issues are now available here:`
			`* https://spacecruft.org/spacecruft/satnogs-wut/issues`
consider 2020-01-03 13:58:35 -07:00
			`## Tensorflow / Keras`
			`At present Tensorflow and Keras are used.`

			`* Learn Keras / Tensorflow...`

			`* What part of image is being evaluated?`

			`* Re-evaluate each step.`

			`* Right now the prediction output is just "good" or "bad", needs`
			`"failed" too.`

			`* Give confidence score in each prediction.`

			`* Visualize what ML is looking at.`

			`* Separate out good/bad/failed by satellite, transmitter, or encoding.`
			`This way "good" isn't considering a "good" vetting to be a totally`
			`different encoding. Right now, it is considering as good observations`
			`that should be bad...`

			`* If it has a low confidence, return "unknown" instead of "good" or "bad".`


usage stub 2020-01-02 17:11:16 -07:00			`# Caveats`
wut-dl-sort-txmode 2020-01-03 14:34:22 -07:00			`This is nearly the first machine learning script I've done,`
			`I know little about radio and less about satellites,`
usage stub 2020-01-02 17:11:16 -07:00			`and I'm not a programmer.`


script info 2020-01-02 16:44:03 -07:00			`# Source License / Copying`
url 2020-01-02 16:56:30 -07:00			`Main repository is available here:`
usage stub 2020-01-02 17:11:16 -07:00
url 2020-01-02 16:56:30 -07:00			`* https://spacecruft.org/spacecruft/satnogs-wut`


license 2020-01-02 16:55:08 -07:00			`License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.`
stub 2020-01-01 23:18:12 -07:00
license 2020-01-02 16:55:08 -07:00			`Copyright (C) 2019, 2020, Jeff Moe`