# satnogs-wut The goal of satnogs-wut is to have a script that will take an observation ID and return an answer whether the observation is "good", "bad", or "failed". ## Good Observation ![Good Observation](pics/waterfall-good.png) ## Bad Observation ![Bad Observation](pics/waterfall-bad.png) ## Failed Observation ![Failed Observation](pics/waterfall-failed.png) # Machine Learning The system at present is built upon the following: * Debian * Tensorflow * Keras Learning/testing, results are inaccurate. # wut? The following scripts are in the repo: * `wut` --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation. * `wut-audio-archive` --- Downloads audio files from archive.org. * `wut-compare` --- Compare an observations' current presumably human vetting with a `wut` vetting. * `wut-compare-all` --- Compare all the observations in `download/` with `wut` vettings. * `wut-compare-tx` --- Compare all the observations in `download/` with `wut` vettings using selected transmitter UUID. * `wut-compare-txmode` --- Compare all the observations in `download/` with `wut` vettings using selected encoding. * `wut-compare-txmode-csv` --- Compare all the observations in `download/` with `wut` vettings using selected encoding, CSV output. * `wut-dl-sort` --- Populate `data/` dir with waterfalls from `download/`. * `wut-dl-sort-tx` --- Populate `data/` dir with waterfalls from `download/` using selected transmitter UUID. * `wut-dl-sort-txmode` --- Populate `data/` dir with waterfalls from `download/` using selected encoding. * `wut-files` --- Tells you about what files you have in `downloads/` and `data/`. * `wut-ml` --- Main machine learning Python script using Tensorflow and Keras. * `wut-ml-load` --- Machine learning Python script using Tensorflow and Keras, load `data/wut.h5`. * `wut-ml-save` --- Machine learning Python script using Tensorflow and Keras, save `data/wut.h5`. * `wut-obs` --- Download the JSON for an observation ID. * `wut-ogg2wav` --- Convert `.ogg` files in `downloads/` to `.wav` files. * `wut-review-staging` --- Review all images in `data/staging`. * `wut-water` --- Download waterfall for an observation ID to `download/[ID]`. * `wut-water-range` --- Download waterfalls for a range of observation IDs to `download/[ID]`. # Installation Most of the scripts are simple shell scripts with few dependencies. ## Setup The scripts use files that are ignored in the git repo. So you need to create those directories: ``` mkdir -p download mkdir -p data/train/good mkdir -p data/train/bad mkdir -p data/train/failed mkdir -p data/val/good mkdir -p data/val/bad mkdir -p data/val/failed mkdir -p data/staging mkdir -p data/test/unvetted ``` ## Debian Packages You'll need `curl` and `jq`, both in Debian's repos. ``` apt update apt install curl jq ``` ## Machine Learning For the machine learning scripts, like `wut-ml`, both Tensorflow and Keras need to be installed. The versions of those in Debian didn't work for me. IIRC, for Tensorflow I built a `pip` of version 2.0.0 from git and installed that. I installed Keras with `pip`. Something like: ``` # XXX These aren't the exact commands, need to check... apt update # deps... apt install python3-pip ... # Install bazel or whatever their build system is # Install Tensorflow git clone tensorflow... cd tensorflow ./configure # run some bazel command dpkg -i /tmp/pkg_foo/*.deb apt update apt -f install # Install Keras pip3 install --user keras # A million other commands.... ``` # Usage The main purpose of the script is to evaluate an observation, but to do that, it needs to build a corpus of observations to learn from. So many of the scripts in this repo are just for downloading and managing observations. The following steps need to be performed: 1. Download waterfalls and JSON descriptions with `wut-water-range`. These get put in the `downloads/[ID]/` directories. 1. Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed"). Use `wut-dl-sort` script. The script will sort them into their respective directories under: * `data/train/good/` * `data/train/bad/` * `data/train/failed/` * `data/val/good/` * `data/val/bad/` * `data/val/failed/` 1. Use machine learning script `wut-ml` to build a model based on the files in the `data/train` and `data/val` directories. 1. Rate an observation using the `wut` script. # ml.spacecruft.org This server is processing the data and has directories available to sync. * https://ml.spacecruft.org/ ## Data Caching Downloads The scripts are designed to not download a waterfall or make a JSON request for an observation it has already requested. The first time an observation is requested, it is downloaded from the SatNOGS network to the `download/` directory. That `download/` directory is the download cache. The `data/` directory is just temporary files, mostly linked from the `downloads/` directory. Files in the `data/` directory are deleted by many scripts, so don't put anything you want to keep in there. ## Preprocessed Files Files in the `preprocess/` directory have been preprocessed to be used further in the pipeline. This contains `.wav` files that have been decoded from `.ogg` files. ## SatNOGS Observation Data Mirror The downloaded waterfalls are available below via `http` and `rsync`. Use this instead of downloading from SatNOGS to save their bandwidth. ``` # Something like: wget --mirror https://ml.spacecruft.org/download # Or with rsync: mkdir download rsync -ultav rsync://ml.spacecruft.org/download/ download/ ``` # TODO / Brainstorms This is a first draft of how to do this. The actual machine learning process hasn't been looked at at all, except to get it to generate an answer. It has a long ways to go. There are also many ways to do this besides using Tensorflow and Keras. Originally, I considered using OpenCV. Ideas in no particular order below. ## General General considerations. * Use Open CV. * Use something other than Tensorflow / Keras. * Do mirror of `network.satnogs.org` and do API calls to it for data. * Issues are now available here: * https://spacecruft.org/spacecruft/satnogs-wut/issues ## Tensorflow / Keras At present Tensorflow and Keras are used. * Learn Keras / Tensorflow... * What part of image is being evaluated? * Re-evaluate each step. * Right now the prediction output is just "good" or "bad", needs "failed" too. * Give confidence score in each prediction. * Visualize what ML is looking at. * Separate out good/bad/failed by satellite, transmitter, or encoding. This way "good" isn't considering a "good" vetting to be a totally different encoding. Right now, it is considering as good observations that should be bad... * If it has a low confidence, return "unknown" instead of "good" or "bad". # Caveats This is nearly the first machine learning script I've done, I know little about radio and less about satellites, and I'm not a programmer. # Source License / Copying Main repository is available here: * https://spacecruft.org/spacecruft/satnogs-wut License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses. Copyright (C) 2019, 2020, Jeff Moe