dataset table

main
Jeff Moe 2023-11-23 11:56:55 -07:00
parent 96b07adc61
commit a9cb42505d
1 changed files with 3 additions and 32 deletions

View File

@ -22,24 +22,6 @@ of these three lists as an acceptable free/open license:
* https://commons.wikimedia.org/wiki/Commons:Licensing
# Datasets to Evaluate
Datasets freely available to download, but may not have suitable license.
Determine which, if any, are ok.
* StackOverflow.
* Rust cargo.
* Debian source code, issues.
* GBIF.
* Gutenberg.
* arxiv.
# Suitable Datasets
Datasets from the following may be suitable to use for training.
* Wikipedia.
# Unsuitable Licenses
Licenses that are not free, libre, open, even if they may claim to
be "open source".
@ -51,21 +33,10 @@ These are not "Wikipedia Commons compatible", for example:
* Any "custom" license that hasn't been reviewed by the general community.
# Unsuitable Datasets
Datasets that are not free, libre, open, even if they may claim to
be "open source".
# Datasets Table
Table of datasets. See also the spreadsheet `datasets.ods`.
## Unsuitable Model License
The following models are unsuitable due to using an unsuitable license.
### Internet Scrapes
The following are just scrapes of the Internet:
* Common Crawl.
* RefinedWeb.
### Non-commercial
Non-commercial licenses are not open source and are not suitable.
![Table of Datasets](img/datasets-table.png)
# License