226 lines
7.1 KiB
Markdown
226 lines
7.1 KiB
Markdown
# satnogs-wut
|
|
|
|
The goal of satnogs-wut is to have a script that will take an
|
|
observation ID and return an answer whether the observation is
|
|
"good", "bad", or "failed".
|
|
|
|
## Good Observation
|
|
![Good Observation](pics/waterfall-good.png)
|
|
|
|
## Bad Observation
|
|
![Bad Observation](pics/waterfall-bad.png)
|
|
|
|
## Failed Observation
|
|
![Failed Observation](pics/waterfall-failed.png)
|
|
|
|
# Machine Learning
|
|
The system at present is built upon the following:
|
|
|
|
* Debian
|
|
* Tensorflow
|
|
* Keras
|
|
|
|
|
|
Learning/testing, results are inaccurate.
|
|
|
|
|
|
# wut?
|
|
The following scripts are in the repo:
|
|
|
|
* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
|
|
* `wut-audio-archive` --- Downloads audio files from archive.org.
|
|
* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
|
|
* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
|
|
* `wut-compare-tx` --- Compare all the observations in `download/` with `wut` vettings using selected transmitter UUID.
|
|
* `wut-compare-txmode` --- Compare all the observations in `download/` with `wut` vettings using selected encoding.
|
|
* `wut-compare-txmode-csv` --- Compare all the observations in `download/` with `wut` vettings using selected encoding, CSV output.
|
|
* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
|
|
* `wut-dl-sort-tx` --- Populate `data/` dir with waterfalls from `download/` using selected transmitter UUID.
|
|
* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
|
|
* `wut-files` --- Tells you about what files you have in `downloads/` and `data/`.
|
|
* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
|
|
* `wut-ml-load` --- Machine learning Python script using Tensorflow and Keras, load `data/wut.h5`.
|
|
* `wut-ml-save` --- Machine learning Python script using Tensorflow and Keras, save `data/wut.h5`.
|
|
* `wut-obs` --- Download the JSON for an observation ID.
|
|
* `wut-ogg2wav` --- Convert `.ogg` files in `downloads/` to `.wav` files.
|
|
* `wut-review-staging` --- Review all images in `data/staging`.
|
|
* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
|
|
* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.
|
|
|
|
|
|
# Installation
|
|
Most of the scripts are simple shell scripts with few dependencies.
|
|
|
|
## Setup
|
|
The scripts use files that are ignored in the git repo.
|
|
So you need to create those directories:
|
|
|
|
```
|
|
mkdir -p download
|
|
mkdir -p data/train/good
|
|
mkdir -p data/train/bad
|
|
mkdir -p data/train/failed
|
|
mkdir -p data/val/good
|
|
mkdir -p data/val/bad
|
|
mkdir -p data/val/failed
|
|
mkdir -p data/staging
|
|
mkdir -p data/test/unvetted
|
|
```
|
|
|
|
## Debian Packages
|
|
You'll need `curl` and `jq`, both in Debian's repos.
|
|
|
|
```
|
|
apt update
|
|
apt install curl jq
|
|
```
|
|
|
|
## Machine Learning
|
|
For the machine learning scripts, like `wut-ml`, both Tensorflow
|
|
and Keras need to be installed. The versions of those in Debian
|
|
didn't work for me. IIRC, for Tensorflow I built a `pip` of
|
|
version 2.0.0 from git and installed that. I installed Keras
|
|
with `pip`. Something like:
|
|
|
|
```
|
|
# XXX These aren't the exact commands, need to check...
|
|
apt update
|
|
# deps...
|
|
apt install python3-pip ...
|
|
# Install bazel or whatever their build system is
|
|
# Install Tensorflow
|
|
git clone tensorflow...
|
|
cd tensorflow
|
|
./configure
|
|
# run some bazel command
|
|
dpkg -i /tmp/pkg_foo/*.deb
|
|
apt update
|
|
apt -f install
|
|
# Install Keras
|
|
pip3 install --user keras
|
|
# A million other commands....
|
|
```
|
|
|
|
|
|
# Usage
|
|
The main purpose of the script is to evaluate an observation,
|
|
but to do that, it needs to build a corpus of observations to
|
|
learn from. So many of the scripts in this repo are just for
|
|
downloading and managing observations.
|
|
|
|
|
|
The following steps need to be performed:
|
|
|
|
1. Download waterfalls and JSON descriptions with `wut-water-range`.
|
|
These get put in the `downloads/[ID]/` directories.
|
|
|
|
1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").
|
|
Use `wut-dl-sort` script.
|
|
The script will sort them into their respective directories under:
|
|
* `data/train/good/`
|
|
* `data/train/bad/`
|
|
* `data/train/failed/`
|
|
* `data/val/good/`
|
|
* `data/val/bad/`
|
|
* `data/val/failed/`
|
|
|
|
1. Use machine learning script `wut-ml` to build a model based on
|
|
the files in the `data/train` and `data/val` directories.
|
|
|
|
1. Rate an observation using the `wut` script.
|
|
|
|
# ml.spacecruft.org
|
|
This server is processing the data and has directories available
|
|
to sync.
|
|
|
|
* https://ml.spacecruft.org/
|
|
|
|
## Data Caching Downloads
|
|
The scripts are designed to not download a waterfall or make a JSON request
|
|
for an observation it has already requested. The first time an observation
|
|
is requested, it is downloaded from the SatNOGS network to the `download/`
|
|
directory. That `download/` directory is the download cache.
|
|
|
|
|
|
The `data/` directory is just temporary files, mostly linked from the
|
|
`downloads/` directory. Files in the `data/` directory are deleted by many
|
|
scripts, so don't put anything you want to keep in there.
|
|
|
|
|
|
## Preprocessed Files
|
|
Files in the `preprocess/` directory have been preprocessed to be used
|
|
further in the pipeline. This contains `.wav` files that have been
|
|
decoded from `.ogg` files.
|
|
|
|
|
|
## SatNOGS Observation Data Mirror
|
|
The downloaded waterfalls are available below via `http` and `rsync`.
|
|
Use this instead of downloading from SatNOGS to save their bandwidth.
|
|
|
|
```
|
|
# Something like:
|
|
wget --mirror https://ml.spacecruft.org/download
|
|
# Or with rsync:
|
|
mkdir download
|
|
rsync -ultav rsync://ml.spacecruft.org/download/ download/
|
|
```
|
|
|
|
# TODO / Brainstorms
|
|
This is a first draft of how to do this. The actual machine learning
|
|
process hasn't been looked at at all, except to get it to generate
|
|
an answer. It has a long ways to go. There are also many ways to do
|
|
this besides using Tensorflow and Keras. Originally, I considered
|
|
using OpenCV. Ideas in no particular order below.
|
|
|
|
## General
|
|
General considerations.
|
|
|
|
* Use Open CV.
|
|
|
|
* Use something other than Tensorflow / Keras.
|
|
|
|
* Do mirror of `network.satnogs.org` and do API calls to it for data.
|
|
|
|
* Issues are now available here:
|
|
* https://spacecruft.org/spacecruft/satnogs-wut/issues
|
|
|
|
## Tensorflow / Keras
|
|
At present Tensorflow and Keras are used.
|
|
|
|
* Learn Keras / Tensorflow...
|
|
|
|
* What part of image is being evaluated?
|
|
|
|
* Re-evaluate each step.
|
|
|
|
* Right now the prediction output is just "good" or "bad", needs
|
|
"failed" too.
|
|
|
|
* Give confidence score in each prediction.
|
|
|
|
* Visualize what ML is looking at.
|
|
|
|
* Separate out good/bad/failed by satellite, transmitter, or encoding.
|
|
This way "good" isn't considering a "good" vetting to be a totally
|
|
different encoding. Right now, it is considering as good observations
|
|
that should be bad...
|
|
|
|
* If it has a low confidence, return "unknown" instead of "good" or "bad".
|
|
|
|
|
|
# Caveats
|
|
This is nearly the first machine learning script I've done,
|
|
I know little about radio and less about satellites,
|
|
and I'm not a programmer.
|
|
|
|
|
|
# Source License / Copying
|
|
Main repository is available here:
|
|
|
|
* https://spacecruft.org/spacecruft/satnogs-wut
|
|
|
|
|
|
License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.
|
|
|
|
Copyright (C) 2019, 2020, Jeff Moe
|