satnogs-wut/README.md

270 lines
8.6 KiB
Markdown
Raw Normal View History

2020-01-01 23:12:46 -07:00
# satnogs-wut
2020-01-02 16:44:03 -07:00
The goal of satnogs-wut is to have a script that will take an
observation ID and return an answer whether the observation is
"good", "bad", or "failed".
2020-01-02 16:52:23 -07:00
## Good Observation
2020-01-02 16:51:29 -07:00
![Good Observation](pics/waterfall-good.png)
2020-01-02 16:52:23 -07:00
## Bad Observation
2020-01-02 16:51:29 -07:00
![Bad Observation](pics/waterfall-bad.png)
2020-01-02 16:52:23 -07:00
## Failed Observation
2020-01-02 16:51:29 -07:00
![Failed Observation](pics/waterfall-failed.png)
2020-01-02 16:44:03 -07:00
# Machine Learning
2020-01-02 22:10:19 -07:00
The system at present is built upon the following:
2020-01-02 16:44:03 -07:00
* Debian
2020-01-01 23:18:12 -07:00
* Tensorflow
* Keras
2020-01-02 22:10:19 -07:00
Learning/testing, results are inaccurate.
2020-01-01 23:18:12 -07:00
2020-01-02 16:44:03 -07:00
# wut?
The following scripts are in the repo:
* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
2020-01-10 18:05:57 -07:00
* `wut-audio-archive` --- Downloads audio files from archive.org.
2020-01-02 22:10:19 -07:00
* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
2020-01-02 20:41:56 -07:00
* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
2020-01-03 15:23:12 -07:00
* `wut-compare-tx` --- Compare all the observations in `download/` with `wut` vettings using selected transmitter UUID.
2020-01-03 14:48:41 -07:00
* `wut-compare-txmode` --- Compare all the observations in `download/` with `wut` vettings using selected encoding.
2020-01-10 18:05:57 -07:00
* `wut-compare-txmode-csv` --- Compare all the observations in `download/` with `wut` vettings using selected encoding, CSV output.
* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
2020-01-03 15:23:12 -07:00
* `wut-dl-sort-tx` --- Populate `data/` dir with waterfalls from `download/` using selected transmitter UUID.
2020-01-03 14:34:22 -07:00
* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
2020-01-10 18:05:57 -07:00
* `wut-files` --- Tells you about what files you have in `downloads/` and `data/`.
* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
2020-01-10 18:05:57 -07:00
* `wut-ml-load` --- Machine learning Python script using Tensorflow and Keras, load `data/wut.h5`.
* `wut-ml-save` --- Machine learning Python script using Tensorflow and Keras, save `data/wut.h5`.
2020-01-02 19:13:58 -07:00
* `wut-obs` --- Download the JSON for an observation ID.
2020-01-10 18:05:57 -07:00
* `wut-ogg2wav` --- Convert `.ogg` files in `downloads/` to `.wav` files.
* `wut-review-staging` --- Review all images in `data/staging`.
2020-01-02 19:13:58 -07:00
* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.
2020-01-02 16:44:03 -07:00
2020-01-10 21:43:29 -07:00
# Jupyter
There is a Jupyter Lab Notebook file.
* `wut-ml.ipynb` --- Machine learning Python script using Tensorflow and Keras in a Jupyter Notebook.
2020-01-02 16:44:03 -07:00
2020-01-02 17:30:22 -07:00
# Installation
Most of the scripts are simple shell scripts with few dependencies.
## Setup
The scripts use files that are ignored in the git repo.
So you need to create those directories:
```
mkdir -p download
mkdir -p data/train/good
mkdir -p data/train/bad
mkdir -p data/train/failed
2020-01-02 19:28:26 -07:00
mkdir -p data/val/good
mkdir -p data/val/bad
mkdir -p data/val/failed
2020-01-02 17:30:22 -07:00
mkdir -p data/staging
mkdir -p data/test/unvetted
```
## Debian Packages
You'll need `curl` and `jq`, both in Debian's repos.
```
apt update
apt install curl jq
```
2020-01-15 17:06:04 -07:00
## Install Tensorflow
For the machine learning scripts, like `wut-ml`, Tensorflow
needs to be installed.
As of version 2 of Tensorflow, Keras no longer needs to be
installed separately.
The verions of Tensorflow installed with `pip3` on Debian
Buster crashes. It is perhaps best to do a custom install,
based preferred build options, of the most preferred version.
At this point, the `remotes/origin/r2.1` branch is preferred.
To install Tensorflow:
2020-01-15 18:43:59 -07:00
* https://www.tensorflow.org/install/source
2020-01-15 17:06:04 -07:00
1. Install dependencies in Debian.
1. Install Bazel to build Tensorflow.
1. Build Tensorflow pip package.
1. Install Tensorflow from custom pip package.
2020-01-02 17:30:22 -07:00
```
2020-01-15 17:06:04 -07:00
# Install deps
2020-01-02 22:10:19 -07:00
apt update
2020-01-15 17:06:04 -07:00
apt install python3-pip
# Install bazel .deb from releases here:
firefox https://github.com/bazelbuild/bazel/releases
2020-01-02 17:30:22 -07:00
# Install Tensorflow
git clone tensorflow...
cd tensorflow
2020-01-15 17:06:04 -07:00
git checkout remotes/origin/r2.1
2020-01-02 17:30:22 -07:00
./configure
2020-01-15 18:43:59 -07:00
# Run Bazel to build pip package. Takes nearly 2 hours to build.
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip3 install --user /tmp/tensorflow_pkg/tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
2020-01-02 17:30:22 -07:00
```
2020-01-15 17:11:43 -07:00
## Install Jupyter
Jupyter is a cute little web interface that makes Python programming
easy. It works well for machine learning because you can step through
just parts of the code, changing variables and immediately seeing
output in the web browser.
Probably installed like this:
```
pip3 install --user jupyterlab
# Also other good packages, maybe like:
pip3 install --user jupyter-tensorboard
pip3 list | grep jupyter
# returns:
jupyter 1.0.0
jupyter-client 5.3.4
jupyter-console 6.0.0
jupyter-core 4.6.1
jupyter-tensorboard 0.1.10
jupyterlab 1.2.4
jupyterlab-server 1.0.6
```
2020-01-02 17:11:16 -07:00
# Usage
The main purpose of the script is to evaluate an observation,
but to do that, it needs to build a corpus of observations to
learn from. So many of the scripts in this repo are just for
downloading and managing observations.
The following steps need to be performed:
2020-01-02 22:10:19 -07:00
1. Download waterfalls and JSON descriptions with `wut-water-range`.
2020-01-02 17:11:16 -07:00
These get put in the `downloads/[ID]/` directories.
1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").
Use `wut-dl-sort` script.
2020-01-03 13:58:35 -07:00
The script will sort them into their respective directories under:
2020-01-02 17:16:11 -07:00
* `data/train/good/`
* `data/train/bad/`
* `data/train/failed/`
2020-01-02 19:28:26 -07:00
* `data/val/good/`
* `data/val/bad/`
* `data/val/failed/`
2020-01-02 17:11:16 -07:00
1. Use machine learning script `wut-ml` to build a model based on
2020-01-02 19:28:26 -07:00
the files in the `data/train` and `data/val` directories.
2020-01-02 17:11:16 -07:00
1. Rate an observation using the `wut` script.
2020-01-03 19:03:43 -07:00
# ml.spacecruft.org
This server is processing the data and has directories available
to sync.
* https://ml.spacecruft.org/
## Data Caching Downloads
2020-01-02 22:51:51 -07:00
The scripts are designed to not download a waterfall or make a JSON request
for an observation it has already requested. The first time an observation
2020-01-10 18:05:57 -07:00
is requested, it is downloaded from the SatNOGS network to the `download/`
directory. That `download/` directory is the download cache.
2020-01-02 22:51:51 -07:00
2020-01-10 18:05:57 -07:00
The `data/` directory is just temporary files, mostly linked from the
`downloads/` directory. Files in the `data/` directory are deleted by many
2020-01-02 22:51:51 -07:00
scripts, so don't put anything you want to keep in there.
2020-01-02 17:11:16 -07:00
2020-01-03 13:37:48 -07:00
2020-01-10 18:05:57 -07:00
## Preprocessed Files
Files in the `preprocess/` directory have been preprocessed to be used
further in the pipeline. This contains `.wav` files that have been
decoded from `.ogg` files.
2020-01-03 19:03:43 -07:00
## SatNOGS Observation Data Mirror
2020-01-03 13:37:48 -07:00
The downloaded waterfalls are available below via `http` and `rsync`.
Use this instead of downloading from SatNOGS to save their bandwidth.
```
# Something like:
wget --mirror https://ml.spacecruft.org/download
# Or with rsync:
mkdir download
rsync -ultav rsync://ml.spacecruft.org/download/ download/
```
2020-01-03 13:58:35 -07:00
# TODO / Brainstorms
This is a first draft of how to do this. The actual machine learning
process hasn't been looked at at all, except to get it to generate
an answer. It has a long ways to go. There are also many ways to do
this besides using Tensorflow and Keras. Originally, I considered
using OpenCV. Ideas in no particular order below.
## General
General considerations.
* Use Open CV.
* Use something other than Tensorflow / Keras.
* Do mirror of `network.satnogs.org` and do API calls to it for data.
2020-01-03 14:05:15 -07:00
* Issues are now available here:
* https://spacecruft.org/spacecruft/satnogs-wut/issues
2020-01-03 13:58:35 -07:00
## Tensorflow / Keras
At present Tensorflow and Keras are used.
* Learn Keras / Tensorflow...
* What part of image is being evaluated?
* Re-evaluate each step.
* Right now the prediction output is just "good" or "bad", needs
"failed" too.
* Give confidence score in each prediction.
* Visualize what ML is looking at.
* Separate out good/bad/failed by satellite, transmitter, or encoding.
This way "good" isn't considering a "good" vetting to be a totally
different encoding. Right now, it is considering as good observations
that should be bad...
* If it has a low confidence, return "unknown" instead of "good" or "bad".
2020-01-02 17:11:16 -07:00
# Caveats
2020-01-03 14:34:22 -07:00
This is nearly the first machine learning script I've done,
I know little about radio and less about satellites,
2020-01-02 17:11:16 -07:00
and I'm not a programmer.
2020-01-02 16:44:03 -07:00
# Source License / Copying
2020-01-02 16:56:30 -07:00
Main repository is available here:
2020-01-02 17:11:16 -07:00
2020-01-02 16:56:30 -07:00
* https://spacecruft.org/spacecruft/satnogs-wut
2020-01-02 16:55:08 -07:00
License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.
2020-01-01 23:18:12 -07:00
2020-01-02 16:55:08 -07:00
Copyright (C) 2019, 2020, Jeff Moe