Libre dataset scripts for Parrot. https://parrot.codes/
Go to file
Jeff Moe 96b07adc61 dataset table img 2023-11-23 11:56:44 -07:00
img dataset table img 2023-11-23 11:56:44 -07:00
.gitattributes Spreadsheets with LFS 2023-11-17 08:45:32 -07:00
.gitignore ignore libreoffice temp files 2023-11-17 08:45:54 -07:00
CHANGELOG.txt v0.0.1 2023-11-16 14:40:16 -07:00
LICENSE-CC Creative Commons Attribution-ShareAlike 4.0 International 2023-11-16 11:18:44 -07:00
README.md Datasets, perhaps 2023-11-17 08:45:16 -07:00
datasets.ods More datasets in table... 2023-11-17 22:56:39 -07:00

README.md

Parrot Datasets

Datasets for Parrot Libre AI IDE.

https://parrot.codes

Libre Datasets

A list of libre datasets suitable for training a libre instruct model shall be listed.

Note other well known datasets, and their license suitability.

Parrot Dataset Licensing

The model may use data that is under a license that appears on one of these three lists as an acceptable free/open license:

Datasets to Evaluate

Datasets freely available to download, but may not have suitable license. Determine which, if any, are ok.

  • StackOverflow.
  • Rust cargo.
  • Debian source code, issues.
  • GBIF.
  • Gutenberg.
  • arxiv.

Suitable Datasets

Datasets from the following may be suitable to use for training.

  • Wikipedia.

Unsuitable Licenses

Licenses that are not free, libre, open, even if they may claim to be "open source".

These are not "Wikipedia Commons compatible", for example:

  • Creative Commons Non-commercial (NC).
  • Proprietary licenses.
  • Any "custom" license that hasn't been reviewed by the general community.

Unsuitable Datasets

Datasets that are not free, libre, open, even if they may claim to be "open source".

Unsuitable Model License

The following models are unsuitable due to using an unsuitable license.

Internet Scrapes

The following are just scrapes of the Internet:

  • Common Crawl.
  • RefinedWeb.

Non-commercial

Non-commercial licenses are not open source and are not suitable.

License

Creative Commons Attribution-ShareAlike 4.0 International

Copyright © 2023, Jeff Moe.