Skip to content

Project to analyze Italian Diachronic Language Varieties

Notifications You must be signed in to change notification settings

andreazugarini/vulgaris

Repository files navigation

Vulgaris

Project to analyze Italian Diachronic Language Varieties.

Have a look at the project page - Vulgaris for more details.

Technical report here - accepted at VarDial2020 Workshop, co-located with COLING 2020.

Cite

@inproceedings{zugarini2020vulgaris,
  title={Vulgaris: Analysis of a Corpus for Middle-Age Varieties of Italian Language},
  author={Zugarini, Andrea and Tiezzi, Matteo and Maggini, Marco},
  booktitle={Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects},
  pages={150--159},
  year={2020}
}

Download Script

Disclaimer: we retrieved and analyzed the data from Biblioteca Italiana solely for personal and academic non-commercial purposes. To replicate our analyzes and ease the diachronic language research, we provide the following script that retrieves and organizes the corpus in a convenient structure.

To install all the required dependencies:

pip install -r download_requirements.txt

Then, run the script:

python vulgaris_project.py

By running that script, you declare to respect the following copyright of Biblioteca Italiana: Creative Common License Creative Commons

Perplexity-based Analysis

First you should retrieve the data.

python char_diachronic_lm_exp.py path/to/vulgaris.csv

About

Project to analyze Italian Diachronic Language Varieties

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages