Labelit is a data labelling platform based on open source tool Label Studio. We want to help small teams (having 3 to 5 annotators) who use Label Studio to manage multiple projects.
Labelit creates a Label Studio server for each project in multi-session mode so that it works for multiple users. So, if you have N projects, there will be N Label Studio servers running. Labelit then routes user requests to these servers based on project using reverse-proxy technique. Labelit takes care of managing projects, user authentication and assigning annotators to projects.
- Create and manage multiple data labelling projects.
- Label data using the familiar Label Studio tool.
- Access control using Admin, Manager and Annotators roles.
- Support for datasets from remote storage such as AWS S3.
- Export labelled data automatically at regular intervals.
- Download latest release from releases page using
Source code
option and decompress the downloaded file. - Create a conda environment (recommended) :
conda create -n label-studio python=3.7
- Switch to conda env (recommended) :
conda activate label-studio
- Install requirements for Labelit :
pip install -r requirements.txt
- Enter Django app directory :
cd labelit
- Set secret key for Django as Environment variable (You can generate secret key using https://gist.github.com/ndarville/3452907 or https://djecrety.ir):
export DJANGO_SECRET_KEY='<generated_key>'
- Perform database migration (By default, it uses local SQLite database. If you want to connect to another database, check Django documentation) :
python manage.py migrate
- Create admin user :
python manage.py createsuperuser
- Run the server using gunicorn :
gunicorn --worker-class=gevent --worker-connections=500 --workers=1 -b '0.0.0.0:8000' labelit.wsgi
- Clone the repo :
git clone [email protected]:verloop/labelit.git; cd labelit
- Create a conda environment (recommended) :
conda create -n label-studio python=3.7
- Switch to conda env (recommended) :
conda activate label-studio
- Install requirements for Labelit :
pip install -r requirements.txt
- Enter Django app directory :
cd labelit
- Set secret key for Django as Environment variable (You can generate secret key using https://gist.github.com/ndarville/3452907 or https://djecrety.ir):
export DJANGO_SECRET_KEY="<generated_key>"
- Perform database migration (By default, it uses local SQLite database. If you want to connect to another database, check Django documentation) :
python manage.py migrate
- Create admin user :
python manage.py createsuperuser
- Run the development server :
python manage.py runserver --noreload
After you perform setup, you can use Labelit as follows:
- Go to
http://127.0.0.1:8000/
and login using your credentials. - You can see projects created by all managers in projects list page.
- You can create new users by going to admin panel at
http://127.0.0.1:8000/admin
. - While creating users, make sure you assign the correct access role using
Staff type
field.
- Go to
http://127.0.0.1:8000/
and login using your credentials. - Listing projects
- You will be shown all the projects you created after successful login. You can also go to
http://127.0.0.1:8000/projects/list
for the same.
- You will be shown all the projects you created after successful login. You can also go to
- Creating new project
- You can create new projects using create project page at
http://127.0.0.1:8000/projects/create
and following instructions give there. - You can provide local directory or remote storage directory in the dataset path. We currently support AWS S3 and Google storage (plugin needs to be installed and the server needs to have access to the bucket)
- You need to give labelling configuration as Label Studio XML config. You can test the config first on Label Studio Playground.
- Select export option if needed and we'll automatically export the labelled data in format selected.
- You can create new projects using create project page at
- Assigning annotators
- From the project listing page, click
Manage annotators
to add or remove annotators for the project. - Once you are in the annotator managing page, you can select annotators to be assigned and click
Save
once done to apply the changes.
- From the project listing page, click
- Deleting projects
- From the project listing page, click
Delete
button for the project you want to delete. - Confirm deleting the project from the popup when its shown.
- From the project listing page, click
- Go to
http://127.0.0.1:8000/
and login using your credentials. - You can view all projects assigned in the homepage.
- Click the
Tasks UI
button to view all the tasks that needs to be labelled for the project - Click the
Labelling UI
button to go to labelling page.
Labelit now supports datasets that are stored remotely. We have right now added plugins for the following storage :
- AWS S3
- Google storage
Since this is optional, Labelit by default doesn't install the packages needed for accessing data from these storages. Install package for the storage you need:
For AWS S3 support : pip install boto3
For Google storage support : pip install google-cloud-storage
Make sure the instance/machine is configured with access to the remote storage and check Django settings (./labelit/labelit/settings.py
) for few additional settings.
And that's it! Labelit automatically detects the storage type when you give it while creating project, just make sure you use the right prefix (eg. s3://
for AWS s3 storage).
Currently, Labelit only supports downloading dataset from remote storage. This is performed only one time after project is created. So, in case the data in remote storage is modified later, it won't reflect here.
We will be adding support for remote upload soon to automatically upload labelled data to remote storage.
If you are using any other remote storage type, you can easily write a plugin. Check our Storage
module for this. Once plugin is created, you need to modify Jobs
module for using it.
We strongly advise not to use Labelit on production as it's still under development. In case you want to try, you need to run Django using a proper server such as Gunicorn. Here's an example of how to run it:
gunicorn --worker-class=gevent --worker-connections=500 --workers=1 -b '0.0.0.0:8080' labelit.wsgi
The above command starts a Gunicorn server on port 8080. Make sure you don't run multiple worker processes as it will start duplicate Label Studio servers, hence we are using gevent
in the above example. We will be fixing this in a future release.