96b07adc61 | ||
---|---|---|
img | ||
.gitattributes | ||
.gitignore | ||
CHANGELOG.txt | ||
LICENSE-CC | ||
README.md | ||
datasets.ods |
README.md
Parrot Datasets
Datasets for Parrot Libre AI IDE.
Libre Datasets
A list of libre datasets suitable for training a libre instruct model shall be listed.
Note other well known datasets, and their license suitability.
Parrot Dataset Licensing
The model may use data that is under a license that appears on one of these three lists as an acceptable free/open license:
Datasets to Evaluate
Datasets freely available to download, but may not have suitable license. Determine which, if any, are ok.
- StackOverflow.
- Rust cargo.
- Debian source code, issues.
- GBIF.
- Gutenberg.
- arxiv.
Suitable Datasets
Datasets from the following may be suitable to use for training.
- Wikipedia.
Unsuitable Licenses
Licenses that are not free, libre, open, even if they may claim to be "open source".
These are not "Wikipedia Commons compatible", for example:
- Creative Commons Non-commercial (NC).
- Proprietary licenses.
- Any "custom" license that hasn't been reviewed by the general community.
Unsuitable Datasets
Datasets that are not free, libre, open, even if they may claim to be "open source".
Unsuitable Model License
The following models are unsuitable due to using an unsuitable license.
Internet Scrapes
The following are just scrapes of the Internet:
- Common Crawl.
- RefinedWeb.
Non-commercial
Non-commercial licenses are not open source and are not suitable.
License
Creative Commons Attribution-ShareAlike 4.0 International
Copyright © 2023, Jeff Moe.