Skip to content

Quickstart

This guide provides a quick overview of the Argilla SDK and how to create your first dataset.

Setting up your Argilla project

Install the SDK with pip

To work with Argilla datasets, you need to use the Argilla SDK. You can install the SDK with pip as follows:

Note

The package is not yet available on PyPi. You'll need to install it directly from the GitHub repository.

pip install git+https://github.com/argilla-io/argilla-python.git

Run the Argilla server

If you have already deployed Argilla Server, you can skip this step. Otherwise, you can quickly deploy it in two different ways:

  • Remotely using a HF Space.
  • Locally using Docker.
docker run -d --name quickstart -p 6900:6900 argilla/argilla-quickstart:latest

Connect to the Argilla server

Get your <api_url>:

  • If you are using Hugging Face Spaces, the URL should be constructed as follows: https://[your-owner-name]-[your_space_name].hf.space
  • If you are using Docker, the URL is the URL shown in your browser (by default http://localhost:6900)

Get your <api_key> in My Settings in the Argilla UI (by default owner.apikey).

Note

Make sure to replace <api_url> and <api_key> with your actual values. If you are using a private Hugging Face Space, you need to specify your HF_TOKEN which can be found here.

import argilla_sdk as rg

client = rg.Argilla(
    api_url="<api_url>",
    api_key="<api_key>"
    # extra_headers={"Authorization": f"Bearer {HF_TOKEN}"}
)

Create your first dataset

To create a dataset with a simple text classification task, first, you need to define the dataset settings.

settings = rg.Settings(
    guidelines="Classify the reviews as positive or negative.",
    fields=[
        rg.TextField(
            name="review",
            title="Text from the review",
            use_markdown=False,
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="my_label",
            title="In which category does this article fit?",
            labels=["positive", "negative"],
        )
    ],
)

Now you can create the dataset with the settings you defined. Publish the dataset to make it available in the UI and add the records.

Note

The workspace parameter is optional. If you don't specify it, the dataset will be created in the default workspace admin.

dataset = rg.Dataset(
    name=f"my_first_dataset",
    settings=settings,
    client=client,
)
dataset.create()

Add records to your dataset

Retrieve the data to be added to the dataset. We will use the IMDB dataset from the Hugging Face Datasets library.

pip install -qqq datasets
from datasets import load_dataset

data = load_dataset("imdb", split="train[:100]").to_list()

Now you can add the data to your dataset. Use a mapping to indicate which keys/columns in the source data correspond to the Argilla dataset fields.

dataset.records.log(records=data, mapping={"text": "review"})

🎉 You have successfully created your first dataset with Argilla. You can now access it in the Argilla UI and start annotating the records.

More references