Argilla

Open-source framework for data-centric NLP

Data Labeling, curation, and Inference Store

Designed for MLOps & Feedback Loops

🆕 🔥 Play with Argilla UI with this live-demo powered by Hugging Face Spaces ( login:argilla, password:12345678)

🆕 🔥 Since 1.2.0 Argilla supports vector search for finding the most similar records to a given one. This feature uses vector or semantic search combined with more traditional search (keyword and filter based). Learn more on this deep-dive guide

Documentation | Key Features | Quickstart | Principles | Migration from Rubrix | FAQ

Key Features

Advanced NLP labeling

Programmatic labeling using weak supervision. Built-in label models (Snorkel, Flyingsquid)
Bulk-labeling and search-driven annotation
Iterate on training data with any pre-trained model or library
Efficiently review and refine annotations in the UI and with Python
Use Argilla built-in metrics and methods for finding label and data errors (e.g., cleanlab)
Simple integration with active learning workflows

Monitoring

Close the gap between production data and data collection activities
Auto-monitoring for major NLP libraries and pipelines (spaCy, Hugging Face, FlairNLP)
ASGI middleware for HTTP endpoints
Argilla Metrics to understand data and model issues, like entity consistency for NER models
Integrated with Kibana for custom dashboards

Team workspaces

Bring different users and roles into the NLP data and model lifecycles
Organize data collection, review and monitoring into different workspaces
Manage workspace access for different users

Quickstart

Argilla is composed of a Python Server with Elasticsearch as the database layer, and a Python Client to create and manage datasets.

To get started you just need to run the docker image with following command:

  docker run -d --name quickstart -p 6900:6900 argilla/argilla-quickstart:latest

This will run the latest quickstart docker image with 2 users admin and argilla. The password for these users is 12345678. You can also configure these environment variables as per you needs.

Environment Variables

ADMIN_USERNAME: The admin username to log in Argilla. The default admin username is admin. By setting up a custom username you can use your own username to login into the app.
ADMIN_API_KEY: Argilla provides a Python library to interact with the app (read, write, and update data, log model predictions, etc.). If you don't set this variable, the library and your app will use the default API key i.e. admin.apikey. If you want to secure your app for reading and writing data, we recommend you to set up this variable. The API key you choose can be any string of your choice and you can check an online generator if you like.
ADMIN_PASSWORD: This sets a custom password for login into the app with the argilla username. The default password is 12345678. By setting up a custom password you can use your own password to login into the app.
ANNOTATOR_USERNAME: The annotator username to login in Argilla. The default annotator username is argilla. By setting up a custom username you can use your own username to login into the app.
ANNOTATOR_PASSWORD: This sets a custom password for login into the app with the argilla username. The default password is 12345678. By setting up a custom password you can use your own password to login into the app.
ARGILLA_WORKSPACE: The name of a workspace that will be created and used by default for admin and annotator users. The default value will be the one defined by ADMIN_USERNAME environment variable.
LOAD_DATASETS: This variables will allow you to load sample datasets. The default value will be full. The supported values for this variable is as follows:
1. single: Load single datasets for TextClassification task.
2. full: Load all the sample datasets for NLP tasks (TokenClassification, TextClassification, Text2Text)
3. none: No datasets being loaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quickstart.README.md

quickstart.README.md

Argilla

Open-source framework for data-centric NLP

Documentation | Key Features | Quickstart | Principles | Migration from Rubrix | FAQ

Key Features

Advanced NLP labeling

Monitoring

Team workspaces

Quickstart

Environment Variables

Files

quickstart.README.md

Latest commit

History

quickstart.README.md

File metadata and controls

Argilla

Open-source framework for data-centric NLP

Documentation | Key Features | Quickstart | Principles | Migration from Rubrix | FAQ

Key Features

Advanced NLP labeling

Monitoring

Team workspaces

Quickstart

Environment Variables