Python scripts for computing gamma agreement scores across multiple annotators and sentences in parallel. This is particularly useful for large-scale annotation projects where multiple annotators have labeled spans of text and where large compute is available and computing gamma agreement would take a long time without some parallel computing. It makes use of the pygamma-agreement package. The only thing added is the possibility to do batch processing with some useful logging. Future plan is to integrate this into pygamma directly. Still a work in progress.
- Parallel processing of multiple sentences
- Support for both JSON and CSV input formats
- Clone the repository:
git clone https://github.com/TomMoeras/parallel-pygamma
cd parallel-pygamma
- Install required packages:
pip install -r requirements.txt
Run the demo script:
python src/demo.py
The demo allows you to try both input formats with example data.
Each annotator's file should be named mapped_annotations_<annotator_id>.json
and contain:
[
{
"id": 0,
"text": "Example sentence text",
"word": "target_word",
"label": [
{
"text": "annotated span",
"start": 0,
"end": 14,
"labels": ["label_category"]
}
]
}
]
Each CSV file should contain columns:
Annotator,Sentence,Annotated Text,Start,End,Label
A1,"Example text","annotated span",0,14,category
from core.gamma import GammaAgreementProcessor
# Setup logging with a shared timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
logger = setup_logging(log_dir, timestamp)
#Initialize components
processor = GammaAgreementProcessor(
output_dir="output",
log_dir="logs"
timestamp=timestamp
)
#Process annotations
parallel_processor.run(
input_dir="your_input_dir",
batch_size=4,
max_workers = None # Will use default based on CPU count
max_annotators = None # Optional limit on number of annotators per sentence
)
The tool generates:
- CSV files containing processed annotations
- Final results file with gamma scores
Results are saved in:
output/csv/
: Intermediate CSV filesoutput/results/
: Final gamma scoreslogs/
: Processing logs