formatting

machine translation note
format docs
2023-12-01 13:07:24 -07:00 · 2023-12-01 09:57:24 -07:00 · 2023-11-30 19:40:49 -07:00
2 changed files with 32 additions and 9 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -42,7 +42,7 @@ These are not "Wikipedia Commons compatible", for example:
 Datasets Table
 --------------

-Table of datasets. See also the spreadsheet `datasets.ods`.
+Table of datasets. See also the spreadsheet ``datasets.ods``.

 .. image:: img/datasets-table.png
   :alt: Table of Datasets
@ -65,5 +65,5 @@ The Smack
   :caption: Contents:


-
+.. note:: Parrot documentation is written in English and uses AI machine translation for other languages.

--- a/docs/source/the_smack.rst
+++ b/docs/source/the_smack.rst
@ -1,14 +1,22 @@
 The Smack Dataset
 =================

-The Smack Dataset does not exist. In the future, if it arises, it will be a libre build of The Stack dataset without using the original dataset directly due to non-libre (non-"open source") license encumbrances.
+The Smack Dataset does not exist.
+In the future,
+if it arises,
+it will be a libre build of The Stack dataset without using the original dataset directly due to non-libre (non-"open source") license encumbrances.

 .. note:: Parrot is in early development, not ready for end users.

 The Stack Metadata
 ------------------

-The Stack has a separate metadata repository containing information about the dataset without hosting the dataset itself. This practice is beneficial as it allows researchers to understand dataset contents without being bound by licenses. For instance, how can one agree to a license when they're unaware of the content's licenses? By using metadata files, this issue can be mitigated.
+The Stack has a separate metadata repository containing information about the dataset without hosting the dataset itself.
+This practice is beneficial as it allows researchers to understand dataset contents without being bound by licenses.
+For instance,
+how can one agree to a license when they're unaware of the content's licenses?
+By using metadata files,
+this issue can be mitigated.

 Link to the Git Repository:

@ -19,12 +27,15 @@ Link to the Git Repository:
 Downloading Metadata
 --------------------

-The metadata is considerably less than the entire dataset, but still substantially large. The Git repository is approximately one terabyte in size.
+The metadata is considerably less than the entire dataset,
+but still substantially large.
+The Git metadata repository is approximately one terabyte in size.

 Reading Metadata
 ----------------

-The Stack's metadata is stored in parquet format, a welcomed choice. The parquet files span 562 gigabytes and consist of 2,832 individual files across 945 directories.
+The Stack's metadata is stored in parquet format.
+The parquet files span 562 gigabytes and consist of 2,832 individual files across 945 directories.

 Selecting Repos
 ---------------
@ -46,8 +57,12 @@ Scripts

 The following scripts are available:

-* ``the-stack-headers`` - Retrieves header names from The Stack's parquet files.
-* ``the-stack-licenses`` - Extracts licenses and records from The Stack's license file.
+* ``the-stack-headers`` --
+  Retrieves header names from The Stack's parquet files.
+
+* ``the-stack-licenses`` --
+  Extracts licenses and records from The Stack's license file.
+

 Code Assist
 -----------
@ -55,9 +70,13 @@ Code Assist
 The following scripts were developed using Parrot code assist:

 * ``the-stack-headers``
+
 * ``the-stack-licenses``

-These scripts were created with the `The Phind-CodeLlama-34B-v2_q8.guff` model from TheBloke.
+
+These scripts were created with the
+`The Phind-CodeLlama-34B-v2_q8.guff`
+model from TheBloke.


 .. toctree::
@ -69,3 +88,7 @@ These scripts were created with the `The Phind-CodeLlama-34B-v2_q8.guff` model f

 .. automodule:: the_smack.the_stack_licenses
   :members:
+
+
+.. note:: Parrot documentation is written in English and uses AI machine translation for other languages.
+
Author	SHA1	Message	Date
Jeff Moe	af5d09a709	formatting	2023-12-01 13:07:24 -07:00
Jeff Moe	b2df8ac079	machine translation note	2023-12-01 09:57:24 -07:00
Jeff Moe	ea01745dbc	format docs	2023-11-30 19:40:49 -07:00