Compare commits
52 Commits
Author | SHA1 | Date |
---|---|---|
![]() |
507d40c45c | |
![]() |
af5d09a709 | |
![]() |
b2df8ac079 | |
![]() |
ea01745dbc | |
![]() |
7107db27c2 | |
![]() |
d04b0b4cb1 | |
![]() |
3272692be2 | |
![]() |
a0d35874bc | |
![]() |
a70dbaec9f | |
![]() |
68cbf51151 | |
![]() |
e053d14155 | |
![]() |
873960c927 | |
![]() |
f8a2d265a8 | |
![]() |
cbe6149161 | |
![]() |
924c6a6e0e | |
![]() |
b641a66111 | |
![]() |
d1c44f1f7d | |
![]() |
bf1e9c43be | |
![]() |
9972f98afa | |
![]() |
612a323adf | |
![]() |
40601b03d1 | |
![]() |
66b7539ba3 | |
![]() |
0a124d048c | |
![]() |
1b3f0e26cd | |
![]() |
478a7980b8 | |
![]() |
33bcd9f539 | |
![]() |
1dfbb8d268 | |
![]() |
5764d3bf1d | |
![]() |
0781ae6621 | |
![]() |
9d6a199d47 | |
![]() |
27f4661c55 | |
![]() |
9d6222c5ac | |
![]() |
5511c02d46 | |
![]() |
ba4c7c0044 | |
![]() |
b6ea86ff93 | |
![]() |
6ecfa2cba0 | |
![]() |
a898bcaae3 | |
![]() |
46216ab9ef | |
![]() |
a9e5a38ef4 | |
![]() |
a5526cfe29 | |
![]() |
c86109f2ca | |
![]() |
49c63860bc | |
![]() |
4e15ae4fed | |
![]() |
57e305a75e | |
![]() |
e3f25d4b7e | |
![]() |
029e40e316 | |
![]() |
f3be8d6566 | |
![]() |
2ce0c9365e | |
![]() |
5f5425d69a | |
![]() |
852d9d9e61 | |
![]() |
208d4c2c0d | |
![]() |
83310747f5 |
|
@ -1,3 +1,12 @@
|
|||
.~lock.*.ods#
|
||||
*.swp
|
||||
.pytest_cache/
|
||||
.python-version
|
||||
build
|
||||
env
|
||||
tmp
|
||||
venv
|
||||
*.swp
|
||||
**/dist/
|
||||
**/*.egg-info/
|
||||
*/target/
|
||||
__pycache__
|
||||
|
|
|
@ -0,0 +1,22 @@
|
|||
# Build
|
||||
Build, perhaps like this:
|
||||
|
||||
```
|
||||
deactivate ; rm -rf venv env dist ; virtualenv env ; source env/bin/activate ; pip install -U setuptools wheel pip ; pip install -r requirements.txt ; pip install -e . ; cd docs/ ; make clean ; make html ; cd .. ; python -m build
|
||||
```
|
||||
|
||||
Cleanish:
|
||||
|
||||
```
|
||||
source env/bin/activate ; cd docs/ ; make clean ; cd .. ; deactivate ; rm -rf venv env dist src/*-info src/*/__pycache__
|
||||
```
|
||||
|
||||
|
||||
# Versions
|
||||
|
||||
```
|
||||
vim CHANGELOG.txt docs/source/conf.py pyproject.toml
|
||||
# git commit CHANGELOG.txt docs/source/conf.py pyproject.toml -m "v0.0.0"
|
||||
# git tag v0.0.0
|
||||
# git push ; git push --tags
|
||||
```
|
|
@ -1,3 +1,8 @@
|
|||
v0.0.9 Fix versions.
|
||||
v0.0.8 Re-arrange project structure.
|
||||
v0.0.7 The Smack, use pyproject.toml, update Sphinx docs.
|
||||
v0.0.6 Read the Docs Sphinx for The Smack.
|
||||
v0.0.5 Sphinx docs for The Smack.
|
||||
v0.0.4 The Stack scriplet to generate license list.
|
||||
v0.0.3 The Smack started.
|
||||
v0.0.2 Dataset table.
|
||||
|
|
58
README.md
58
README.md
|
@ -1,53 +1,67 @@
|
|||
# Parrot Datasets
|
||||
|
||||
.. _parrot-datasets:
|
||||
|
||||
Parrot Datasets
|
||||
===============
|
||||
|
||||
Datasets for Parrot Libre AI IDE.
|
||||
|
||||
https://parrot.codes
|
||||
.. _parrot-libre-datasets:
|
||||
|
||||
Libre Datasets
|
||||
---------------
|
||||
|
||||
# Libre Datasets
|
||||
A list of libre datasets suitable for training a libre instruct model
|
||||
shall be listed.
|
||||
|
||||
A list of libre datasets suitable for training a libre instruct model shall be listed.
|
||||
Note other well known datasets, and their license suitability.
|
||||
|
||||
.. _parrot-dataset-licensing:
|
||||
|
||||
# Parrot Dataset Licensing
|
||||
The model may use data that is under a license that appears on one
|
||||
of these three lists as an acceptable free/open license:
|
||||
Parrot Dataset Licensing
|
||||
-------------------------
|
||||
|
||||
The model may use data that is under a license that appears on one of these three lists as an acceptable free/open license:
|
||||
|
||||
* https://www.gnu.org/licenses/license-list.html
|
||||
|
||||
* https://opensource.org/licenses/
|
||||
|
||||
* https://commons.wikimedia.org/wiki/Commons:Licensing
|
||||
|
||||
.. _unsuitable-licenses:
|
||||
|
||||
# Unsuitable Licenses
|
||||
Licenses that are not free, libre, open, even if they may claim to
|
||||
be "open source".
|
||||
Unsuitable Licenses
|
||||
--------------------
|
||||
|
||||
Licenses that are not free, libre, open, even if they may claim to be "open source".
|
||||
These are not "Wikipedia Commons compatible", for example:
|
||||
|
||||
* Creative Commons Non-commercial (NC).
|
||||
* Proprietary licenses.
|
||||
* Any "custom" license that hasn't been reviewed by the general community.
|
||||
|
||||
.. _datasets-table:
|
||||
|
||||
Datasets Table
|
||||
--------------
|
||||
|
||||
# Datasets Table
|
||||
Table of datasets. See also the spreadsheet `datasets.ods`.
|
||||
|
||||
![Table of Datasets](img/datasets-table.png)
|
||||
.. image:: img/datasets-table.png
|
||||
:alt: Table of Datasets
|
||||
|
||||
.. _datasets:
|
||||
|
||||
Datasets
|
||||
--------
|
||||
|
||||
# Datasets
|
||||
Datasets perhaps to be built and used.
|
||||
|
||||
## The Smack
|
||||
Libre version of The Stack.
|
||||
See: `datasets/the-smack`.
|
||||
* The Smack
|
||||
Libre version of The Stack. See: `datasets/the-smack`.
|
||||
|
||||
.. _license:
|
||||
|
||||
License
|
||||
-------
|
||||
|
||||
# License
|
||||
Creative Commons Attribution-ShareAlike 4.0 International
|
||||
|
||||
*Copyright © 2023, Jeff Moe.*
|
||||
Copyright © 2023, Jeff Moe.
|
||||
|
|
|
@ -1,4 +0,0 @@
|
|||
**/target/
|
||||
Cargo.lock
|
||||
venv
|
||||
env
|
|
@ -1,57 +0,0 @@
|
|||
# The Smack Dataset
|
||||
The Smack Dataset doesn't exist.
|
||||
|
||||
Should it happen to exist someday, it will be a libre build of The Stack dataset,
|
||||
but not using the dataset directly, so as not to be encumbered by The Stack's
|
||||
non-libre (not "open source") license.
|
||||
|
||||
|
||||
# The Stack Metadata
|
||||
The Stack has a metadata repo with details about The Stack dataset, without
|
||||
containing the dataset itself. One reason for this (as they discussed in an
|
||||
issue/post) is so researchers can learn about the dataset contents without
|
||||
being encumbered by the license. For example, how can you agree to a license
|
||||
without knowing the licenses of the contents? Using the metadata files can
|
||||
help with this issue.
|
||||
|
||||
|
||||
# Downloading Metadata
|
||||
While metadata is far less than the total dataset, it is still relatively large.
|
||||
The git repo is a bit over one terabyte.
|
||||
|
||||
Here is a link to the git repository:
|
||||
|
||||
```
|
||||
git clone https://huggingface.co/datasets/bigcode/the-stack-metadata
|
||||
```
|
||||
|
||||
|
||||
# Reading Metadata
|
||||
The Stack metadata is in parquet format, which is swell.
|
||||
The parquet files are currently 562 gigabytes, numbering 2,832 files,
|
||||
in 945 directories.
|
||||
|
||||
|
||||
# Selecting Repos
|
||||
Write a script to select appropriate repos per libre criteria.
|
||||
|
||||
|
||||
# Cloning Repos
|
||||
Write a script to go clone the repos.
|
||||
|
||||
|
||||
# Train
|
||||
Train, using libre code from Bigcode (makers of The Stack).
|
||||
|
||||
|
||||
# Scripts
|
||||
The following scripts are available.
|
||||
|
||||
* `the-stack-headers` --- Reads header names from The Stack parquet files.
|
||||
|
||||
|
||||
# Code Assist
|
||||
The following scripts were written using Parrot code assist.
|
||||
`The Phind-CodeLlama-34B-v2_q8.guff` model from TheBloke was used.
|
||||
|
||||
* `the-stack-headers`
|
|
@ -1,63 +0,0 @@
|
|||
import datasets
|
||||
from pathlib import Path
|
||||
from tqdm.auto import tqdm
|
||||
import pandas as pd
|
||||
|
||||
# assuming metadata is cloned into the local folder /data/hf_repos/the-stack-metadata
|
||||
# the stack is cloned into the local folder /data/hf_repos/the-stack-v1.1
|
||||
# destination folder is in /repo_workdir/numpy_restored
|
||||
the_stack_meta_path = Path('/srv/ml/huggingface/datasets/bigcode/the-stack-metadata')
|
||||
the_stack_path = Path('/data/hf_repos/the-stack-v1.1')
|
||||
repo_dst_root = Path('/repo_workdir/numpy_restored')
|
||||
repo_name = 'numpy/numpy'
|
||||
|
||||
# Get bucket with numpy repo info
|
||||
# meta_bucket_path = None
|
||||
#for fn in tqdm(list((the_stack_meta_path/'data').glob('*/ri.parquet'))):
|
||||
# df = pd.read_parquet(fn)
|
||||
# if any(df['name'] == repo_name):
|
||||
# meta_bucket_path = fn
|
||||
# break
|
||||
meta_bucket_path = the_stack_meta_path / 'data/255_944'
|
||||
|
||||
|
||||
# Get repository id from repo name
|
||||
ri_id = pd.read_parquet(
|
||||
meta_bucket_path / 'ri.parquet'
|
||||
).query(
|
||||
f'`name` == "{repo_name}"'
|
||||
)['id'].to_list()[0]
|
||||
|
||||
# Get files information for the reopository
|
||||
files_info = pd.read_parquet(
|
||||
meta_bucket_path / 'fi.parquet'
|
||||
).query(
|
||||
f'`ri_id` == {ri_id} and `size` != 0 and `is_deleted` == False'
|
||||
)
|
||||
|
||||
# Convert DF with files information to a dictionary by language and then file hexsha
|
||||
# there can be more than one file with the same hexsha in the repo so we gather
|
||||
# all instances per unique hexsha
|
||||
files_info_dict = {
|
||||
k: v[['hexsha', 'path']].groupby('hexsha').apply(lambda x: list(x['path'])).to_dict()
|
||||
for k, v in files_info.groupby('lang_ex')
|
||||
}
|
||||
|
||||
# Load Python part of The Stack
|
||||
ds = datasets.load_dataset(
|
||||
str(the_stack_path/'data/python'),
|
||||
num_proc=10, ignore_verifications=True
|
||||
)
|
||||
|
||||
# Save file content of the python files in the numpy reposirotry in their appropriate locations
|
||||
def save_file_content(example, files_info_dict, repo_dst_root):
|
||||
if example['hexsha'] in files_info_dict:
|
||||
for el in files_info_dict[example['hexsha']]:
|
||||
path = repo_dst_root / el
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(example['content'])
|
||||
ds.map(
|
||||
save_file_content,
|
||||
fn_kwargs={'files_info_dict': files_info_dict['Python'], 'repo_dst_root': repo_dst_root},
|
||||
num_proc=10
|
||||
)
|
|
@ -1,5 +0,0 @@
|
|||
datasets
|
||||
tqdm
|
||||
pandas
|
||||
pathlib
|
||||
termcolor
|
|
@ -0,0 +1,20 @@
|
|||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line, and also
|
||||
# from the environment for the first two.
|
||||
SPHINXOPTS ?=
|
||||
SPHINXBUILD ?= sphinx-build
|
||||
SOURCEDIR = source
|
||||
BUILDDIR = build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
|
@ -0,0 +1,12 @@
|
|||
project = "Parrot Datasets"
|
||||
copyright = "2023, Jeff Moe"
|
||||
author = "Jeff Moe"
|
||||
release = "v0.0.9"
|
||||
extensions = [
|
||||
"sphinx.ext.autodoc",
|
||||
]
|
||||
templates_path = ["_templates"]
|
||||
exclude_patterns = ["_build"]
|
||||
html_theme = "sphinx_rtd_theme"
|
||||
html_static_path = ["_static"]
|
||||
htmlhelp_basename = "ParrotDatasetsdoc"
|
Binary file not shown.
After Width: | Height: | Size: 348 KiB |
|
@ -0,0 +1,69 @@
|
|||
Dataset
|
||||
=======
|
||||
Datasets for Parrot Libre AI IDE.
|
||||
|
||||
There is no Parrot Dataset, at present.
|
||||
|
||||
.. note:: Parrot is in early development, not ready for end users.
|
||||
|
||||
.. _parrot-libre-datasets:
|
||||
|
||||
Libre Datasets
|
||||
---------------
|
||||
|
||||
A list of libre datasets suitable for training a libre instruct model shall be listed.
|
||||
Note other well known datasets, and their license suitability.
|
||||
|
||||
.. _parrot-dataset-licensing:
|
||||
|
||||
Dataset Licensing
|
||||
-----------------
|
||||
|
||||
The model may use data that is under a license that appears on one of these three lists as an acceptable free/open license:
|
||||
|
||||
* https://www.gnu.org/licenses/license-list.html
|
||||
* https://opensource.org/licenses/
|
||||
* https://commons.wikimedia.org/wiki/Commons:Licensing
|
||||
|
||||
.. _unsuitable-licenses:
|
||||
|
||||
Unsuitable Licenses
|
||||
--------------------
|
||||
|
||||
Licenses that are not free, libre, open, even if they may claim to be "open source".
|
||||
These are not "Wikipedia Commons compatible", for example:
|
||||
|
||||
* Creative Commons Non-commercial (NC).
|
||||
* Proprietary licenses.
|
||||
* Any "custom" license that hasn't been reviewed by the general community.
|
||||
|
||||
.. _datasets-table:
|
||||
|
||||
Datasets Table
|
||||
--------------
|
||||
|
||||
Table of datasets. See also the spreadsheet ``datasets.ods``.
|
||||
|
||||
.. image:: img/datasets-table.png
|
||||
:alt: Table of Datasets
|
||||
|
||||
.. _libre_datasets:
|
||||
|
||||
Libre Datasets
|
||||
--------------
|
||||
|
||||
Datasets perhaps to be built and used.
|
||||
|
||||
The Smack
|
||||
^^^^^^^^^
|
||||
| Libre version of The Stack.
|
||||
| See: :doc:`The Smack <the_smack>`.
|
||||
|
||||
.. toctree::
|
||||
the_smack
|
||||
:maxdepth: 1
|
||||
:caption: Contents:
|
||||
|
||||
|
||||
.. note:: Parrot documentation is written in English and uses AI machine translation for other languages.
|
||||
|
|
@ -0,0 +1,94 @@
|
|||
The Smack Dataset
|
||||
=================
|
||||
|
||||
The Smack Dataset does not exist.
|
||||
In the future,
|
||||
if it arises,
|
||||
it will be a libre build of The Stack dataset without using the original dataset directly due to non-libre (non-"open source") license encumbrances.
|
||||
|
||||
.. note:: Parrot is in early development, not ready for end users.
|
||||
|
||||
The Stack Metadata
|
||||
------------------
|
||||
|
||||
The Stack has a separate metadata repository containing information about the dataset without hosting the dataset itself.
|
||||
This practice is beneficial as it allows researchers to understand dataset contents without being bound by licenses.
|
||||
For instance,
|
||||
how can one agree to a license when they're unaware of the content's licenses?
|
||||
By using metadata files,
|
||||
this issue can be mitigated.
|
||||
|
||||
Link to the Git Repository:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://huggingface.co/datasets/bigcode/the-stack-metadata
|
||||
|
||||
Downloading Metadata
|
||||
--------------------
|
||||
|
||||
The metadata is considerably less than the entire dataset,
|
||||
but still substantially large.
|
||||
The Git metadata repository is approximately one terabyte in size.
|
||||
|
||||
Reading Metadata
|
||||
----------------
|
||||
|
||||
The Stack's metadata is stored in parquet format.
|
||||
The parquet files span 562 gigabytes and consist of 2,832 individual files across 945 directories.
|
||||
|
||||
Selecting Repos
|
||||
---------------
|
||||
|
||||
Write a script to filter appropriate repositories based on libre criteria.
|
||||
|
||||
Cloning Repos
|
||||
-------------
|
||||
|
||||
Write a script to clone the selected repositories.
|
||||
|
||||
Train
|
||||
-----
|
||||
|
||||
Utilize libre code from Bigcode (creators of The Stack) for model training.
|
||||
|
||||
Scripts
|
||||
-------
|
||||
|
||||
The following scripts are available:
|
||||
|
||||
* ``the-stack-headers`` --
|
||||
Retrieves header names from The Stack's parquet files.
|
||||
|
||||
* ``the-stack-licenses`` --
|
||||
Extracts licenses and records from The Stack's license file.
|
||||
|
||||
|
||||
Code Assist
|
||||
-----------
|
||||
|
||||
The following scripts were developed using Parrot code assist:
|
||||
|
||||
* ``the-stack-headers``
|
||||
|
||||
* ``the-stack-licenses``
|
||||
|
||||
|
||||
These scripts were created with the
|
||||
`The Phind-CodeLlama-34B-v2_q8.guff`
|
||||
model from TheBloke.
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
.. automodule:: the_smack
|
||||
:members:
|
||||
|
||||
.. automodule:: the_smack.the_stack_licenses
|
||||
:members:
|
||||
|
||||
|
||||
.. note:: Parrot documentation is written in English and uses AI machine translation for other languages.
|
||||
|
|
@ -0,0 +1,12 @@
|
|||
|
||||
[build-system]
|
||||
requires = ["setuptools", "wheel"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "the_smack"
|
||||
version = "0.0.9"
|
||||
|
||||
[project.scripts]
|
||||
the-stack-licenses = "the_smack.the_stack_licenses:main"
|
||||
the-stack-headers = "the_smack.the_stack_headers:main"
|
|
@ -0,0 +1,10 @@
|
|||
datasets
|
||||
tqdm
|
||||
pandas
|
||||
pathlib
|
||||
termcolor
|
||||
pytest
|
||||
sphinx
|
||||
sphinx_rtd_theme
|
||||
build
|
||||
toml
|
|
@ -0,0 +1,5 @@
|
|||
# __init__.py
|
||||
|
||||
import the_smack
|
||||
|
||||
__all__ = ["the_smack"]
|
|
@ -1,5 +1,17 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Script to read and print specific records from the lic.parquet file in a numbered directory under the data/ subdirectory."""
|
||||
"""
|
||||
This script is designed to read and print specific records from the lic.parquet file in a numbered directory under the data/ subdirectory.
|
||||
|
||||
Example usage: python3 script.py --records 1-5 -c
|
||||
|
||||
Command-line options:
|
||||
-h, --help show this help message and exit
|
||||
--version show program's version number and exit
|
||||
-r RANGE, --records=RANGE
|
||||
record number or range to print (e.g., 1, 5-7)
|
||||
-c, --color colorize the output
|
||||
-l, --list_licenses list unique licenses in the file
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
@ -9,6 +21,16 @@ from termcolor import colored
|
|||
|
||||
|
||||
def get_records(dataframe, args):
|
||||
"""
|
||||
Extract records from a DataFrame based on user-specified range.
|
||||
|
||||
Parameters:
|
||||
dataframe (DataFrame): The pandas DataFrame to extract records from.
|
||||
args (Namespace): A namespace object containing parsed command line arguments.
|
||||
|
||||
Returns:
|
||||
DataFrame: The extracted records as a new DataFrame.
|
||||
"""
|
||||
if "-" in args.records:
|
||||
start, end = map(int, args.records.split("-"))
|
||||
return dataframe[start - 1 : end]
|
||||
|
@ -18,6 +40,13 @@ def get_records(dataframe, args):
|
|||
|
||||
|
||||
def print_records(dataframe, color):
|
||||
"""
|
||||
Print the records in a DataFrame with optional colorization.
|
||||
|
||||
Parameters:
|
||||
dataframe (DataFrame): The pandas DataFrame to print.
|
||||
color (bool): If True, colorize the output.
|
||||
"""
|
||||
for index, row in dataframe.iterrows():
|
||||
if color:
|
||||
for col in row.index:
|
||||
|
@ -31,6 +60,12 @@ def print_records(dataframe, color):
|
|||
|
||||
|
||||
def print_unique_licenses(dataframe):
|
||||
"""
|
||||
Print the unique licenses in a DataFrame, sorted alphabetically.
|
||||
|
||||
Parameters:
|
||||
dataframe (DataFrame): The pandas DataFrame to extract licenses from.
|
||||
"""
|
||||
licenses = dataframe["license"].unique().tolist()
|
||||
licenses.sort(
|
||||
key=lambda x: [int(i) if i.isdigit() else i for i in re.split("([0-9]+)", x)]
|
||||
|
@ -40,6 +75,9 @@ def print_unique_licenses(dataframe):
|
|||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Main function to parse command line arguments and run the script.
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Specify the directory and record range to use"
|
||||
)
|
|
@ -0,0 +1,34 @@
|
|||
import pytest
|
||||
import pandas as pd
|
||||
from the_stack_licenses import (
|
||||
get_records,
|
||||
print_records,
|
||||
print_unique_licenses,
|
||||
)
|
||||
|
||||
|
||||
def test_get_records():
|
||||
df = pd.DataFrame({"a": [1, 2, 3]})
|
||||
|
||||
assert get_records(df, "-1-2").equals(pd.DataFrame({"a": [1, 2]}))
|
||||
assert get_records(df, "1").equals(pd.DataFrame({"a": [1]}))
|
||||
|
||||
|
||||
def test_print_records():
|
||||
df = pd.DataFrame({"a": [1, 2]})
|
||||
|
||||
# Mocking print function using built-in unittest.mock module
|
||||
with unittest.mock.patch("builtins.print") as mock_print:
|
||||
print_records(df)
|
||||
mock_print.assert_called_with(df)
|
||||
|
||||
|
||||
def test_print_unique_licenses():
|
||||
df = pd.DataFrame({"license": ["MIT", "GPL", "Apache"]})
|
||||
|
||||
# Mocking print function using built-in unittest.mock module
|
||||
with unittest.mock.patch("builtins.print") as mock_print:
|
||||
print_unique_licenses(df)
|
||||
mock_print.assert_called_with(
|
||||
pd.Series(["Apache", "GPL", "MIT"]).sort_values()
|
||||
) # assuming sorting is done in function
|
Loading…
Reference in New Issue