dataset table
parent
96b07adc61
commit
a9cb42505d
35
README.md
35
README.md
|
@ -22,24 +22,6 @@ of these three lists as an acceptable free/open license:
|
|||
* https://commons.wikimedia.org/wiki/Commons:Licensing
|
||||
|
||||
|
||||
# Datasets to Evaluate
|
||||
Datasets freely available to download, but may not have suitable license.
|
||||
Determine which, if any, are ok.
|
||||
|
||||
* StackOverflow.
|
||||
* Rust cargo.
|
||||
* Debian source code, issues.
|
||||
* GBIF.
|
||||
* Gutenberg.
|
||||
* arxiv.
|
||||
|
||||
|
||||
# Suitable Datasets
|
||||
Datasets from the following may be suitable to use for training.
|
||||
|
||||
* Wikipedia.
|
||||
|
||||
|
||||
# Unsuitable Licenses
|
||||
Licenses that are not free, libre, open, even if they may claim to
|
||||
be "open source".
|
||||
|
@ -51,21 +33,10 @@ These are not "Wikipedia Commons compatible", for example:
|
|||
* Any "custom" license that hasn't been reviewed by the general community.
|
||||
|
||||
|
||||
# Unsuitable Datasets
|
||||
Datasets that are not free, libre, open, even if they may claim to
|
||||
be "open source".
|
||||
# Datasets Table
|
||||
Table of datasets. See also the spreadsheet `datasets.ods`.
|
||||
|
||||
## Unsuitable Model License
|
||||
The following models are unsuitable due to using an unsuitable license.
|
||||
|
||||
### Internet Scrapes
|
||||
The following are just scrapes of the Internet:
|
||||
|
||||
* Common Crawl.
|
||||
* RefinedWeb.
|
||||
|
||||
### Non-commercial
|
||||
Non-commercial licenses are not open source and are not suitable.
|
||||
![Table of Datasets](img/datasets-table.png)
|
||||
|
||||
|
||||
# License
|
||||
|
|
Loading…
Reference in New Issue