Home

Awesome

<h1 align="center"> <a href=""><img src="https://github.com/dvsrepo/imgs/raw/main/rg.svg" alt="Argilla" width="150"></a> <br> Argilla <br> </h1> <h3 align="center">Build high quality datasets for your AI models</h2> <p align="center"> <a href="https://pypi.org/project/argilla/"> <img alt="CI" src="https://img.shields.io/pypi/v/argilla.svg?style=flat-round&logo=pypi&logoColor=white"> </a> <img alt="Codecov" src="https://codecov.io/gh/argilla-io/argilla/branch/main/graph/badge.svg?token=VDVR29VOMG"/> <a href="https://pepy.tech/project/argilla"> <img alt="CI" src="https://static.pepy.tech/personalized-badge/argilla?period=month&units=international_system&left_color=grey&right_color=blue&left_text=pypi%20downloads/month"> </a> <a href="https://huggingface.co/new-space?template=argilla/argilla-template-space"> <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-sm.svg"/> </a> </p> <p align="center"> <a href="https://twitter.com/argilla_io"> <img src="https://img.shields.io/badge/twitter-black?logo=x"/> </a> <a href="https://www.linkedin.com/company/argilla-io"> <img src="https://img.shields.io/badge/linkedin-blue?logo=linkedin"/> </a> <a href="http://hf.co/join/discord"> <img src="https://img.shields.io/badge/Discord-7289DA?&logo=discord&logoColor=white"/> </a> </p>

Argilla is a collaboration tool for AI engineers and domain experts who need to build high-quality datasets for their projects.

If you just want to get started, deploy Argilla on Hugging Face Spaces. Curious, and want to know more? Read our documentation.

Or, play with the Argilla UI by signing in with your Hugging Face account:

<p> <a href="https://huggingface.co/spaces/argilla/argilla-template-space" title="Redirect to homepage"> <img src="https://github.com/user-attachments/assets/b9f34620-dd5e-4738-9750-0a85a4caae00" alt="homepage" /> </a> </p>

Why use Argilla?

Argilla can be used for collecting human feedback for a wide variety of AI projects like traditional NLP (text classification, NER, etc.), LLMs (RAG, preference tuning, etc.), or multimodal models (text to image, etc.). Argilla's programmatic approach lets you build workflows for continuous evaluation and model improvement. The goal of Argilla is to ensure your data work pays off by quickly iterating on the right data and models.

Improve your AI output quality through data quality

Compute is expensive and output quality is important. We help you focus on data, which tackles the root cause of both of these problems at once. Argilla helps you to achieve and keep high-quality standards for your data. This means you can improve the quality of your AI output.

Take control of your data and models

Most AI tools are black boxes. Argilla is different. We believe that you should be the owner of both your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.

Improve efficiency by quickly iterating on the right data and models

Gathering data is a time-consuming process. Argilla helps by providing a tool that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.

🏘️ Community

We are an open-source community-driven project and we love to hear from you. Here are some ways to get involved:

What do people build with Argilla?

Open-source datasets and models

The community uses Argilla to create amazing open-source datasets and models.

Examples Use cases

AI teams from organizations such as the Red Cross, Loris.ai and Prolific use Argilla to improve the quality and efficiency of AI projects. They shared their experiences in our AI community meetup.

👨‍💻 Getting started

Installation

First things first! You can install the SDK with pip as follows:

pip install argilla

After that, you will need to deploy Argilla Server. The easiest way to do this is through our free Hugging Face Spaces deployment integration.

To use the client, you need to import the Argilla class and instantiate it with the API URL and API key.

import argilla as rg

client = rg.Argilla(api_url="https://[your-owner-name]-[your_space_name].hf.space", api_key="owner.apikey")

Create your first dataset

We can now create a dataset with a simple text classification task. First, you need to define the dataset settings.

settings = rg.Settings(
    guidelines="Classify the reviews as positive or negative.",
    fields=[
        rg.TextField(
            name="review",
            title="Text from the review",
            use_markdown=False,
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="my_label",
            title="In which category does this article fit?",
            labels=["positive", "negative"],
        )
    ],
)
dataset = rg.Dataset(
    name=f"my_first_dataset",
    settings=settings,
    client=client,
)
dataset.create()

Next, we can add records to the dataset.

pip install datasets
from datasets import load_dataset

data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})

🎉 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records. Need more info, check out our docs.

🥇 Contributors

To help our community with the creation of contributions, we have created our community docs. Additionally, you can always schedule a meeting with our Developer Advocacy team so they can get you up to speed.

<a href="https://github.com/argilla-io/argilla/graphs/contributors"> <img src="https://contrib.rocks/image?repo=argilla-io/argilla" /> </a>