Project to analyze Italian Diachronic Language Varieties.
Have a look at the project page - Vulgaris for more details.
Technical report here - accepted at VarDial2020 Workshop, co-located with COLING 2020.
@inproceedings{zugarini2020vulgaris,
title={Vulgaris: Analysis of a Corpus for Middle-Age Varieties of Italian Language},
author={Zugarini, Andrea and Tiezzi, Matteo and Maggini, Marco},
booktitle={Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects},
pages={150--159},
year={2020}
}
Disclaimer: we retrieved and analyzed the data from Biblioteca Italiana solely for personal and academic non-commercial purposes. To replicate our analyzes and ease the diachronic language research, we provide the following script that retrieves and organizes the corpus in a convenient structure.
To install all the required dependencies:
pip install -r download_requirements.txt
Then, run the script:
python vulgaris_project.py
By running that script, you declare to respect the following copyright of Biblioteca Italiana: Creative Common License
First you should retrieve the data.
python char_diachronic_lm_exp.py path/to/vulgaris.csv