Parallel Gamma Agreement

Python scripts for computing gamma agreement scores across multiple annotators and sentences in parallel. This is particularly useful for large-scale annotation projects where multiple annotators have labeled spans of text and where large compute is available and computing gamma agreement would take a long time without some parallel computing. It makes use of the pygamma-agreement package. The only thing added is the possibility to do batch processing with some useful logging. Future plan is to integrate this into pygamma directly. Still a work in progress.

Features

Parallel processing of multiple sentences
Support for both JSON and CSV input formats

Installation

Clone the repository:

git clone https://github.com/TomMoeras/parallel-pygamma
cd parallel-pygamma

Install required packages:

pip install -r requirements.txt

Usage

Quick Start

Run the demo script:

python src/demo.py

The demo allows you to try both input formats with example data.

Input Formats

1. JSON Format (Multiple Annotators)

Each annotator's file should be named mapped_annotations_<annotator_id>.json and contain:

[
    {
    "id": 0,
    "text": "Example sentence text",
    "word": "target_word",
    "label": [
            {
            "text": "annotated span",
            "start": 0,
            "end": 14,
            "labels": ["label_category"]
            }
        ]
    }
]

2. CSV Format (Per Sentence)

Each CSV file should contain columns:

Annotator,Sentence,Annotated Text,Start,End,Label
A1,"Example text","annotated span",0,14,category

Using in Your Project

from core.gamma import GammaAgreementProcessor

# Setup logging with a shared timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
logger = setup_logging(log_dir, timestamp)

#Initialize components
processor = GammaAgreementProcessor(
    output_dir="output",
    log_dir="logs"
    timestamp=timestamp
)

#Process annotations
parallel_processor.run(
    input_dir="your_input_dir",
    batch_size=4,
    max_workers = None  # Will use default based on CPU count
    max_annotators = None  # Optional limit on number of annotators per sentence
)

Output

The tool generates:

CSV files containing processed annotations
Final results file with gamma scores

Results are saved in:

output/csv/: Intermediate CSV files
output/results/: Final gamma scores
logs/: Processing logs

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Gamma Agreement

Features

Installation

Usage

Quick Start

Input Formats

1. JSON Format (Multiple Annotators)

2. CSV Format (Per Sentence)

Using in Your Project

Output

About

Releases

Packages

Languages

TomMoeras/parallel-pygamma

Folders and files

Latest commit

History

Repository files navigation

Parallel Gamma Agreement

Features

Installation

Usage

Quick Start

Input Formats

1. JSON Format (Multiple Annotators)

2. CSV Format (Per Sentence)

Using in Your Project

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages