From a9cb42505d9512b36c051b7a691c015b25c5984e Mon Sep 17 00:00:00 2001 From: Jeff Moe Date: Thu, 23 Nov 2023 11:56:55 -0700 Subject: [PATCH] dataset table --- README.md | 35 +++-------------------------------- 1 file changed, 3 insertions(+), 32 deletions(-) diff --git a/README.md b/README.md index 9ba1ad0..7d25632 100644 --- a/README.md +++ b/README.md @@ -22,24 +22,6 @@ of these three lists as an acceptable free/open license: * https://commons.wikimedia.org/wiki/Commons:Licensing -# Datasets to Evaluate -Datasets freely available to download, but may not have suitable license. -Determine which, if any, are ok. - -* StackOverflow. -* Rust cargo. -* Debian source code, issues. -* GBIF. -* Gutenberg. -* arxiv. - - -# Suitable Datasets -Datasets from the following may be suitable to use for training. - -* Wikipedia. - - # Unsuitable Licenses Licenses that are not free, libre, open, even if they may claim to be "open source". @@ -51,21 +33,10 @@ These are not "Wikipedia Commons compatible", for example: * Any "custom" license that hasn't been reviewed by the general community. -# Unsuitable Datasets -Datasets that are not free, libre, open, even if they may claim to -be "open source". +# Datasets Table +Table of datasets. See also the spreadsheet `datasets.ods`. -## Unsuitable Model License -The following models are unsuitable due to using an unsuitable license. - -### Internet Scrapes -The following are just scrapes of the Internet: - -* Common Crawl. -* RefinedWeb. - -### Non-commercial -Non-commercial licenses are not open source and are not suitable. +![Table of Datasets](img/datasets-table.png) # License