spacecruft

wut? --- What U Think? SatNOGS Observation AI. https://spacecruft.org/spacecruft/satnogs-wut

Go to file

ml server 72f7a68a56 web explanation stub, 3 ratings		2020-01-21 19:44:57 -07:00
notebooks	web explanation stub, 3 ratings	2020-01-21 19:44:57 -07:00
pics	examples	2020-01-02 16:51:29 -07:00
scripts	uh, voila on wut, jupyter on ml0 duh	2020-01-21 19:08:09 -07:00
.gitignore	ignore jupyter tmp files	2020-01-21 13:03:57 -07:00
LICENSE-CC	license	2020-01-02 16:55:08 -07:00
LICENSE-GPL	license	2020-01-02 16:55:08 -07:00
README-distributed.md	dist deps	2020-01-21 11:45:31 -07:00
README-voila.md	some deps used for voila	2020-01-21 11:44:04 -07:00
README.md	cruft	2020-01-20 19:09:22 -07:00
wut	new output	2020-01-03 23:42:58 -07:00
wut-audio-archive	clean on wut-predict jupyter draft	2020-01-16 19:25:44 -07:00
wut-audio-sha1	wut-ogg2wav, misc	2020-01-06 15:35:22 -07:00
wut-compare	wut-compare scripts	2020-01-02 20:41:56 -07:00
wut-compare-all	Flip scoring, rename variables	2020-01-02 20:56:16 -07:00
wut-compare-tx	wut-ml-load wut-ml-save	2020-01-04 01:06:03 -07:00
wut-compare-txmode	new output	2020-01-03 23:42:58 -07:00
wut-compare-txmode-csv	update docs	2020-01-10 18:05:57 -07:00
wut-dl-sort	jupyter	2020-01-10 23:14:03 -07:00
wut-dl-sort-tx	info on data fiels	2020-01-21 16:06:47 -07:00
wut-dl-sort-txmode	tweaklets	2020-01-03 18:13:37 -07:00
wut-files	add df to -files	2020-01-10 20:38:15 -07:00
wut-files-data	info on data fiels	2020-01-21 16:06:47 -07:00
wut-ml	wut-ogg2wav, misc	2020-01-06 15:35:22 -07:00
wut-ml-auto	not really used...	2020-01-21 16:07:48 -07:00
wut-ml-load	typo in py not used anymore	2020-01-16 19:47:52 -07:00
wut-ml-save	update docs	2020-01-10 18:05:57 -07:00
wut-obs	rename	2020-01-02 19:13:58 -07:00
wut-ogg2wav	update docs	2020-01-10 18:05:57 -07:00
wut-review-staging	renames	2020-01-02 11:53:15 -07:00
wut-rm-random	clean on wut-predict jupyter draft	2020-01-16 19:25:44 -07:00
wut-tf	wtf scripts to check tensorflow setup	2020-01-20 12:26:00 -07:00
wut-tf.py	wtf scripts to check tensorflow setup	2020-01-20 12:26:00 -07:00
wut-train-cluster-fn.py	wut-train-cluster-fn.py from jupyter export...	2020-01-21 15:55:13 -07:00
wut-water	rename	2020-01-02 19:13:58 -07:00
wut-water-range	info on data fiels	2020-01-21 16:06:47 -07:00
wut-worker	distributed getting closer...	2020-01-18 15:02:34 -07:00
wut-worker-mas	cruft	2020-01-20 19:09:22 -07:00
wut-worker-mas.py	cruft	2020-01-20 19:09:22 -07:00
wut-worker-train-cluster-fn	ml-int	2020-01-20 19:03:02 -07:00
wut-worker.py	distributed working except fit()	2020-01-18 15:43:33 -07:00

README.md

wut?

wut --- What U Think? SatNOGS Observation AI.

satnogs-wut

The goal of satnogs-wut is to have a script that will take an observation ID and return an answer whether the observation is "good", "bad", or "failed".

Good Observation

Bad Observation

Failed Observation

Observations

Machine Learning

The system at present is built upon the following:

Debian Buster.
Tensorflow 2.1 with built-in Keras.
Jupyter Lab.

Learning/testing, results are ~~inaccurate~~ getting closer. The main AI/ML development is now being done in Jupyter.

Jupyter

There is a Jupyter Lab Notebook file. This is producing real results at present, but has a long ways to go still...

wut-ml.ipynb --- Machine learning Python script using Tensorflow and Keras in a Jupyter Notebook.
wut-predict.ipynb --- Make prediction (rating) of observation, using data/wut.h5.
wut-train.ipynb --- ML Training file saved to data/wut.h5.

wut scripts

The following scripts are in the repo:

wut --- Feed it an observation ID and it returns if it is a "good", "bad", or "failed" observation.
wut-audio-archive --- Downloads audio files from archive.org.
wut-compare --- Compare an observations' current presumably human vetting with a wut vetting.
wut-compare-all --- Compare all the observations in download/ with wut vettings.
wut-compare-tx --- Compare all the observations in download/ with wut vettings using selected transmitter UUID.
wut-compare-txmode --- Compare all the observations in download/ with wut vettings using selected encoding.
wut-compare-txmode-csv --- Compare all the observations in download/ with wut vettings using selected encoding, CSV output.
wut-dl-sort --- Populate data/ dir with waterfalls from download/.
wut-dl-sort-tx --- Populate data/ dir with waterfalls from download/ using selected transmitter UUID.
wut-dl-sort-txmode --- Populate data/ dir with waterfalls from download/ using selected encoding.
wut-files --- Tells you about what files you have in downloads/ and data/.
wut-ml --- Main machine learning Python script using Tensorflow and Keras.
wut-ml-load --- Machine learning Python script using Tensorflow and Keras, load data/wut.h5.
wut-ml-save --- Machine learning Python script using Tensorflow and Keras, save data/wut.h5.
wut-obs --- Download the JSON for an observation ID.
wut-ogg2wav --- Convert .ogg files in downloads/ to .wav files.
wut-review-staging --- Review all images in data/staging.
wut-water --- Download waterfall for an observation ID to download/[ID].
wut-water-range --- Download waterfalls for a range of observation IDs to download/[ID].

Installation

Most of the scripts are simple shell scripts with few dependencies.

Setup

The scripts use files that are ignored in the git repo. So you need to create those directories:

mkdir -p download
mkdir -p data/train/good
mkdir -p data/train/bad
mkdir -p data/train/failed
mkdir -p data/val/good
mkdir -p data/val/bad
mkdir -p data/val/failed
mkdir -p data/staging
mkdir -p data/test/unvetted

Debian Packages

You'll need curl and jq, both in Debian's repos.

apt update
apt install curl jq

Install Tensorflow

For the machine learning scripts, like wut-ml, Tensorflow needs to be installed. As of version 2 of Tensorflow, Keras no longer needs to be installed separately.

The verions of Tensorflow installed with pip3 on Debian Buster crashes. It is perhaps best to do a custom install, best preferred build options, of the most preferred version. At this point, the remotes/origin/r2.1 branch is preferred.

To install Tensorflow:

https://www.tensorflow.org/install/source

Install dependencies in Debian.
Install Bazel to build Tensorflow.
Build Tensorflow pip package.
Install Tensorflow from custom pip package.

# Install deps
apt update
apt install python3-pip
# Install bazel .deb from releases here:
firefox https://github.com/bazelbuild/bazel/releases
# Install Tensorflow
git clone tensorflow...
cd tensorflow
git checkout v2.1.0
bazel clean
# Get flags to pass:
grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
./configure
# Run Bazel to build pip package. Takes nearly 2 hours to build.
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip3 install --user /tmp/tensorflow_pkg/tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl

Tensorflow KVM Notes

Recent versions of Tensorflow can handle many more CPU build options to optimize for speed, such as AVX. By default, Proxmox and likely other virtual machine systems pass kvm/qemu "type=kvm" for CPU type. To use all possible CPU options available on the bare metal server, use "type=host". For more info about this in Proxmox, see CPU Type If you don't have this enabled, CPU instructions will fail or Tensorflow will run slower than it could.

Tensor Configuration

$ ./configure 
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.29.1 installed.
Please specify the location of python. [Default is /usr/bin/python3]: 


Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: 
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: 
Clang will not be downloaded.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: -march=native -mssse3 -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
	--config=v2          	# Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished

KVM

Note, for KVM, pass cpu=host if host has "avx" in /proc/cpuinfo.

Install Jupyter

Jupyter is a cute little web interface that makes Python programming easy. It works well for machine learning because you can step through just parts of the code, changing variables and immediately seeing output in the web browser.

Probably installed like this:

pip3 install --user jupyterlab
# Also other good packages, maybe like:
pip3 install --user jupyter-tensorboard
pip3 list | grep jupyter
# returns:
jupyter              1.0.0                                    
jupyter-client       5.3.4                                    
jupyter-console      6.0.0                                    
jupyter-core         4.6.1                                    
jupyter-tensorboard  0.1.10                                   
jupyterlab           1.2.4                                    
jupyterlab-server    1.0.6

Usage

The main purpose of the script is to evaluate an observation, but to do that, it needs to build a corpus of observations to learn from. So many of the scripts in this repo are just for downloading and managing observations.

The following steps need to be performed:

Download waterfalls and JSON descriptions with wut-water-range. These get put in the downloads/[ID]/ directories.
Organize downloaded waterfalls into categories (e.g. "good", "bad", "failed"). Use wut-dl-sort script. The script will sort them into their respective directories under:
- data/train/good/
- data/train/bad/
- data/train/failed/
- data/val/good/
- data/val/bad/
- data/val/failed/
Use machine learning script wut-ml to build a model based on the files in the data/train and data/val directories.
Rate an observation using the wut script.

ml.spacecruft.org

This server is processing the data and has directories available to sync.

https://ml.spacecruft.org/

Data Caching Downloads

The scripts are designed to not download a waterfall or make a JSON request for an observation it has already requested. The first time an observation is requested, it is downloaded from the SatNOGS network to the download/ directory. That download/ directory is the download cache.

The data/ directory is just temporary files, mostly linked from the downloads/ directory. Files in the data/ directory are deleted by many scripts, so don't put anything you want to keep in there.

Preprocessed Files

Files in the preprocess/ directory have been preprocessed to be used further in the pipeline. This contains .wav files that have been decoded from .ogg files.

SatNOGS Observation Data Mirror

The downloaded waterfalls are available below via http and rsync. Use this instead of downloading from SatNOGS to save their bandwidth.

# Something like:
wget --mirror https://ml.spacecruft.org/download
# Or with rsync:
mkdir download
rsync -ultav rsync://ml.spacecruft.org/download/ download/

TODO / Brainstorms

This is a first draft of how to do this. The actual machine learning process hasn't been looked at at all, except to get it to generate an answer. It has a long ways to go. There are also many ways to do this besides using Tensorflow and Keras. Originally, I considered using OpenCV. Ideas in no particular order below.

General

General considerations.

Use Open CV.
Use something other than Tensorflow / Keras.
Do mirror of network.satnogs.org and do API calls to it for data.
Issues are now available here:
- https://spacecruft.org/spacecruft/satnogs-wut/issues

Tensorflow / Keras

At present Tensorflow and Keras are used.

Learn Keras / Tensorflow...
What part of image is being evaluated?
Re-evaluate each step.
Right now the prediction output is just "good" or "bad", needs "failed" too.
Give confidence score in each prediction.
Visualize what ML is looking at.
Separate out good/bad/failed by satellite, transmitter, or encoding. This way "good" isn't considering a "good" vetting to be a totally different encoding. Right now, it is considering as good observations that should be bad...
If it has a low confidence, return "unknown" instead of "good" or "bad".

Caveats

This is nearly the first machine learning script I've done, I know little about radio and less about satellites, and I'm not a programmer.

Source License / Copying

Main repository is available here:

https://spacecruft.org/spacecruft/satnogs-wut

License: CC By SA 4.0 International and/or GPLv3+ at your discretion. Other code licensed under their own respective licenses.