gbif-cruft/README.md

2.2 KiB

GBIF Cruft

Crufty scripts for working with Global Biodiversity Information Facility (GBIF) data.

Install

Install thusly.

Using Debian Bookworm (stable/12) as a base.

Dependencies

Dependencies that may be needed:

apt install git python3-pip python3-virtualenv python3-venv python-is-python3

Python

Get code and set up Python, suit to taste, such as:

git clone https://spacecruft.org/deepcrayon/gbif-cruft
cd gbif-cruft
python -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .

Run

Thusly:

gbif-cruft

Usage

Help:

$ gbif-cruft 
Usage: gbif-cruft [OPTIONS] COMMAND [ARGS]...

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  parquet
  search

Search help:

$ gbif-cruft search --help
Usage: gbif-cruft search [OPTIONS] [NAME]

Options:
  -h, --help  Show this message and exit.

Parquet help:

$ gbif-cruft parquet --help
Usage: gbif-cruft parquet [OPTIONS] [FILENAME]

Options:
  -h, --help  Show this message and exit.

Example

Such as:

./gbif-cruft --search foo
{"snark": "foo"}

Development

Run black on the Python files for nice formatting:

black gbif-cruft*

TODO

Perhaps:

  • rclone GBIF snapshot, approximately 210 gigs of parquet files.
  • Read parquet files with local tools. Perhaps pyarrow, pqv, dask,
  • Parquet with GraphQL? graphique.
  • Perhaps import parquet files into sota db.
  • Test on ppc64le.
  • Type hinted Python dataclass for GBIF API.
  • Mypyc or similar.
  • sota media storage.

Upstream

GBIF

The main upstream project is the Global Biodiversity Information Facility:

pygbif:

Status

Alpha, under development.

Disclaimer

I am not a programmer, I'm learning Python.

Copyright

Unofficial project, not related to the Global Biodiversity Information Facility.

Upstream sources under their respective copyrights.

License

Data: CC By SA 4.0 International.

Source Code: AGPLv3+.

Copyright © 2023, Jeff Moe.

gbif-cruft is distributed under the terms of the AGPL-3.0-or-later license.