satnogs-wut/README.md

205 lines
5.9 KiB
Markdown
Raw Normal View History

2020-01-01 23:12:46 -07:00
# satnogs-wut
2020-01-02 16:44:03 -07:00
The goal of satnogs-wut is to have a script that will take an
observation ID and return an answer whether the observation is
"good", "bad", or "failed".
2020-01-02 16:52:23 -07:00
## Good Observation
2020-01-02 16:51:29 -07:00
![Good Observation](pics/waterfall-good.png)
2020-01-02 16:52:23 -07:00
## Bad Observation
2020-01-02 16:51:29 -07:00
![Bad Observation](pics/waterfall-bad.png)
2020-01-02 16:52:23 -07:00
## Failed Observation
2020-01-02 16:51:29 -07:00
![Failed Observation](pics/waterfall-failed.png)
2020-01-02 16:44:03 -07:00
# Machine Learning
2020-01-02 22:10:19 -07:00
The system at present is built upon the following:
2020-01-02 16:44:03 -07:00
* Debian
2020-01-01 23:18:12 -07:00
* Tensorflow
* Keras
2020-01-02 22:10:19 -07:00
Learning/testing, results are inaccurate.
2020-01-01 23:18:12 -07:00
2020-01-02 16:44:03 -07:00
# wut?
The following scripts are in the repo:
* `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
2020-01-02 22:10:19 -07:00
* `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting.
2020-01-02 20:41:56 -07:00
* `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings.
* `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`.
2020-01-03 14:34:22 -07:00
* `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding.
* `wut-ml` --- Main machine learning Python script using Tensorflow and Keras.
2020-01-02 19:13:58 -07:00
* `wut-obs` --- Download the JSON for an observation ID.
* `wut-review-staging` --- Review all images in `data/staging`.
2020-01-02 19:13:58 -07:00
* `wut-water` --- Download waterfall for an observation ID to `download/[ID]`.
* `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`.
2020-01-02 16:44:03 -07:00
2020-01-02 17:30:22 -07:00
# Installation
Most of the scripts are simple shell scripts with few dependencies.
## Setup
The scripts use files that are ignored in the git repo.
So you need to create those directories:
```
mkdir -p download
mkdir -p data/train/good
mkdir -p data/train/bad
mkdir -p data/train/failed
2020-01-02 19:28:26 -07:00
mkdir -p data/val/good
mkdir -p data/val/bad
mkdir -p data/val/failed
2020-01-02 17:30:22 -07:00
mkdir -p data/staging
mkdir -p data/test/unvetted
```
## Debian Packages
You'll need `curl` and `jq`, both in Debian's repos.
```
apt update
apt install curl jq
```
## Machine Learning
For the machine learning scripts, like `wut-ml`, both Tensorflow
and Keras need to be installed. The versions of those in Debian
didn't work for me. IIRC, for Tensorflow I built a `pip` of
version 2.0.0 from git and installed that. I installed Keras
2020-01-02 22:10:19 -07:00
with `pip`. Something like:
2020-01-02 17:30:22 -07:00
```
# XXX These aren't the exact commands, need to check...
2020-01-02 22:10:19 -07:00
apt update
# deps...
apt install python3-pip ...
2020-01-02 17:30:22 -07:00
# Install bazel or whatever their build system is
# Install Tensorflow
git clone tensorflow...
cd tensorflow
./configure
# run some bazel command
dpkg -i /tmp/pkg_foo/*.deb
apt update
apt -f install
# Install Keras
pip3 install --user keras
# A million other commands....
```
2020-01-02 17:11:16 -07:00
# Usage
The main purpose of the script is to evaluate an observation,
but to do that, it needs to build a corpus of observations to
learn from. So many of the scripts in this repo are just for
downloading and managing observations.
The following steps need to be performed:
2020-01-02 22:10:19 -07:00
1. Download waterfalls and JSON descriptions with `wut-water-range`.
2020-01-02 17:11:16 -07:00
These get put in the `downloads/[ID]/` directories.
1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed").
Use `wut-dl-sort` script.
2020-01-03 13:58:35 -07:00
The script will sort them into their respective directories under:
2020-01-02 17:16:11 -07:00
* `data/train/good/`
* `data/train/bad/`
* `data/train/failed/`
2020-01-02 19:28:26 -07:00
* `data/val/good/`
* `data/val/bad/`
* `data/val/failed/`
2020-01-02 17:11:16 -07:00
1. Use machine learning script `wut-ml` to build a model based on
2020-01-02 19:28:26 -07:00
the files in the `data/train` and `data/val` directories.
2020-01-02 17:11:16 -07:00
1. Rate an observation using the `wut` script.
2020-01-02 22:51:51 -07:00
# Data Caching Downloads
The scripts are designed to not download a waterfall or make a JSON request
for an observation it has already requested. The first time an observation
is requested, it is downloaded from the SatNOGS network to the `download`
directory. That `download` directory is the download cache.
The `data` directory is just temporary files,mostly linked from the
`downloads` directory. Files in the `data` directory are deleted by many
scripts, so don't put anything you want to keep in there.
2020-01-02 17:11:16 -07:00
2020-01-03 13:37:48 -07:00
# SatNOGS Observation Data Mirror
The downloaded waterfalls are available below via `http` and `rsync`.
Use this instead of downloading from SatNOGS to save their bandwidth.
```
# Something like:
wget --mirror https://ml.spacecruft.org/download
# Or with rsync:
mkdir download
rsync -ultav rsync://ml.spacecruft.org/download/ download/
```
2020-01-03 13:58:35 -07:00
# TODO / Brainstorms
This is a first draft of how to do this. The actual machine learning
process hasn't been looked at at all, except to get it to generate
an answer. It has a long ways to go. There are also many ways to do
this besides using Tensorflow and Keras. Originally, I considered
using OpenCV. Ideas in no particular order below.
## General
General considerations.
* Use Open CV.
* Use something other than Tensorflow / Keras.
* Do mirror of `network.satnogs.org` and do API calls to it for data.
2020-01-03 14:05:15 -07:00
* Issues are now available here:
* https://spacecruft.org/spacecruft/satnogs-wut/issues
2020-01-03 13:58:35 -07:00
## Tensorflow / Keras
At present Tensorflow and Keras are used.
* Learn Keras / Tensorflow...
* What part of image is being evaluated?
* Re-evaluate each step.
* Right now the prediction output is just "good" or "bad", needs
"failed" too.
* Give confidence score in each prediction.
* Visualize what ML is looking at.
* Separate out good/bad/failed by satellite, transmitter, or encoding.
This way "good" isn't considering a "good" vetting to be a totally
different encoding. Right now, it is considering as good observations
that should be bad...
* If it has a low confidence, return "unknown" instead of "good" or "bad".
2020-01-02 17:11:16 -07:00
# Caveats
2020-01-03 14:34:22 -07:00
This is nearly the first machine learning script I've done,
I know little about radio and less about satellites,
2020-01-02 17:11:16 -07:00
and I'm not a programmer.
2020-01-02 16:44:03 -07:00
# Source License / Copying
2020-01-02 16:56:30 -07:00
Main repository is available here:
2020-01-02 17:11:16 -07:00
2020-01-02 16:56:30 -07:00
* https://spacecruft.org/spacecruft/satnogs-wut
2020-01-02 16:55:08 -07:00
License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.
2020-01-01 23:18:12 -07:00
2020-01-02 16:55:08 -07:00
Copyright (C) 2019, 2020, Jeff Moe