Scripts for working with the Global Biodiversity Information Facility (GBIF). https://spacecruft.org/deepcrayon/gbif-cruft
Go to file
Jeff Moe a1ddbdf93d sample is upstream license 2023-10-07 19:31:50 -06:00
sample sample is upstream license 2023-10-07 19:31:50 -06:00
src Parquet schema 2023-10-07 17:56:47 -06:00
tests Set up as new hatch project 2023-10-07 10:50:18 -06:00
.gitignore Ignore more py temp files 2023-10-07 10:54:48 -06:00
LICENSE.txt Set up as new hatch project 2023-10-07 10:50:18 -06:00
README.md rm extra lazy group. Rename to command to parquet 2023-10-07 16:31:41 -06:00
pyproject.toml black line length 80 formatting. parquet metadata 2023-10-07 17:03:15 -06:00

README.md

GBIF Cruft

Crufty scripts for working with Global Biodiversity Information Facility (GBIF) data.

Install

Install thusly.

Using Debian Bookworm (stable/12) as a base.

Dependencies

Dependencies that may be needed:

apt install git python3-pip python3-virtualenv

Python

Get code and set up Python, suit to taste, such as:

git clone https://spacecruft.org/deepcrayon/gbif-cruft
cd gbif-cruft
virtualenv -p python3 env
source env/bin/activate
pip install --upgrade pip setup tools wheel
pip install -e .

Run

Thusly:

gbif-cruft

Usage

Help:

$ gbif-cruft 
Usage: gbif-cruft [OPTIONS] COMMAND [ARGS]...

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  parquet
  search

Search help:

$ gbif-cruft search --help
Usage: gbif-cruft search [OPTIONS] [NAME]

Options:
  -h, --help  Show this message and exit.

Parquet help:

$ gbif-cruft parquet --help
Usage: gbif-cruft parquet [OPTIONS] [FILENAME]

Options:
  -h, --help  Show this message and exit.

Example

Such as:

./gbif-cruft --search foo
{"snark": "foo"}

Development

Run black on the Python files for nice formatting:

black gbif-cruft*

TODO

Perhaps:

  • rclone GBIF snapshot, approximately 210 gigs of parquet files.
  • Read parquet files with local tools. Perhaps pyarrow, pqv, dask,
  • Parquet with GraphQL? graphique.
  • Perhaps import parquet files into sota db.
  • Test on ppc64le.
  • Type hinted Python dataclass for GBIF API.
  • Mypyc or similar.
  • sota media storage.

Upstream

GBIF

The main upstream project is the Global Biodiversity Information Facility:

pygbif:

Status

Alpha, under development.

Disclaimer

I am not a programmer, I'm learning Python.

Copyright

Unofficial project, not related to the Global Biodiversity Information Facility.

Upstream sources under their respective copyrights.

License

Data: CC By SA 4.0 International.

Source Code: AGPLv3+.

Copyright © 2023, Jeff Moe.

gbif-cruft is distributed under the terms of the AGPL-3.0-or-later license.