Home

Awesome

PyPI Conda Forge codecov Documentation Status Maintained Yes Contributions Welcome MIT License DOI

<p align="center"> <img width="450" src="https://github.com/webis-de/small-text/blob/dev/docs/_static/small-text-logo.png?raw=true" alt="small-text logo" /> </p>

Active Learning for Text Classification in Python.

<hr>

Installation | Quick Start | Contribution | Changelog | Docs

Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.

What is Active Learning?

Active Learning allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.

<p align="center"> <img src="https://raw.githubusercontent.com/webis-de/small-text/dev/docs/_static/learning-curve-example.gif?raw=true" alt="Learning curve example for the TREC-6 dataset." width="60%"> </p>

Features


News

Version 2.0.0 dev1 (v2.0.0.dev1) - November 24th, 2024

Version 1.4.1 (v1.4.1) - August 18th, 2024

Version 1.4.0 (v1.4.0) - June 9th, 2024

Paper published at EACL 2023 🎉

For a complete list of changes, see the change log.


Installation

Small-Text can be easily installed via pip:

pip install small-text

The command results in a slim installation with only the necessary dependencies. For a full installation via pip, you just need to include the transformers extra requirement:

pip install small-text[transformers]

The library requires Python 3.8 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.

Quick Start

For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.

Notebooks

<div align="center">
#Notebook
1Intro: Active Learning for Text Classification with Small-TextOpen In Colab
2Using Stopping Criteria for Active LearningOpen In Colab
3Active Learning using SetFitOpen In Colab
4Using SetFit's Zero Shot Capabilities for Cold Start InitializationOpen In Colab
</div>

Showcase

A full list of showcases can be found in the docs.

🎀 Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.

Documentation

Read the latest documentation here. Noteworthy pages include:


Scope of Features

<table align="center"> <caption>Extension of Table 1 in the <a href="https://aclanthology.org/2023.eacl-demo.11v2.pdf" target="_blank">EACL 2023 paper</a>.</caption> <thead> <tr> <th>Name</th> <th colspan="2">Active Learning</th> </tr> <tr> <th></th> <th>Query Strategies</th> <th>Stopping Criteria</th> </tr> </thead> <tbody> <tr> <td>small-text v1.3.0</td> <td>14</td> <td>5</td> </tr> <tr> <td>small-text v2.0.0</td> <td>19</td> <td>5</td> </tr> </tbody> </table>

We use the numbers only to show to tremendous progress that small-text has made over time. There many features and improvements that are not reflected in these numbers.

Alternatives

modAL, ALiPy, libact, ALToolbox


Contribution

Contributions are welcome. Details can be found in CONTRIBUTING.md.

Acknowledgments

This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.

Citation

Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:

@inproceedings{schroeder2023small-text,
    title = "Small-Text: Active Learning for Text Classification in Python",
    author = {Schr{\"o}der, Christopher  and  M{\"u}ller, Lydia  and  Niekler, Andreas  and  Potthast, Martin},
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.eacl-demo.11",
    pages = "84--95"
}

License

MIT License