-
Notifications
You must be signed in to change notification settings - Fork 1
Contributing to Ancient Linguistic Annotation Projects
Tuesday November 22, 2022, starting at 16:00 GMT = 17:00 CET (for 90 minutes)
Convenors: Francesca Dell’Oro (Université de Neuchâtel), Francesco Mambrini (Università Cattolica del Sacro Cuore, Milan)
Youtube link: https://youtu.be/tLBOuKyQuWo
Slides: tba
This session introduces two approaches to creating linguistic annotations of ancient texts. The first part gives an overview to Universal Dependencies (UD), the most important cross-linguistic standard for morphosyntactic annotation. The session includes a short introduction, followed by a longer practical session dedicated to treebanks for ancient languages, tools for automatic annotation, and tools for querying UD corpora.
The second part outlines the WoPoss project, aiming at the description of modality in the Latin language, and presents automatic and manual annotation of a Latin work (Satyricon). First, we focus on the automatic annotation of lemmas, parts of speech (henceforth PoS) and morphological analysis. Then we outline the manual annotation of a modal passage according to a simplified version of the WoPoss Guidelines (Dell’Oro 2022). The search interface to query the corpus will be also briefly presented (https://woposs.unine.ch/form.html).
-
Dell’Oro, Francesca; Bermúdez Sabel, Helena; Marongiu, Paola (2020). “Implemented to be shared: the WoPoss annotation of semantic modality in a Latin diachronic corpus”. Sharing the Experience: Workflows for the Digital Humanities. Proceedings of the DARIAH-CH Workshop 2019. December 5-6, 2019. Neuchâtel, Switzerland. Available: https://campus.dariah.eu/resource/events/sharing-the-experience-workflows-for-the-digital-humanities#session-8
-
Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, Daniel Zeman; Universal Dependencies. Computational Linguistics 2021; 47 (2): 255–308. doi: https://doi.org/10.1162/coli_a_00402 (Sections 1-2)
- UDPipe
- PML tree query
- UDEasy
- Dell’Oro, Francesca. 2022. WoPoss guidelines for annotation. Revised version. Swiss National Science Foundation. DOI: https://doi.org/10.5281/zenodo.6417878
Exercise 1: WoPoss
It is possible to contribute to the WoPoss corpus in two ways, as shown by the exercises that will be suggested in the session.
- You can correct the results of the automatic annotation (lemmas, PoS and morphological analysis). If you want to try it during or after the session, you will need to create a GitHub account (https://github.com/).
- You can also try to annotate a modal passage by yourself. In this case, you will need an Inception account. Just ask the WoPoss team to create one for you (write to [email protected]) before or after the session. Your contribution will be recognised in the file description.
Exercise 2: Universal Dependencies (UD)
You can replicate the workflow for automatically generating UD annotation using UDPipe 2, then review your annotation either by editing the raw CoNLL-U text or using conllueditor.
- Select a text in any ancient (or modern) language that you prefer; copy/paste or save an excerpt in a TXT file; optionally, pre-process it as you like (e.g. delete titles, paragraph or page numbers, etc.);
- Go to the UDPipe 2 web service;
- Select the language model by opening the drop-down selection tool and scrolling until you find a suitable model. If the language of your text does not exist, you can always try to use a random model: the results will be ugly, but at least you will have a CoNLL-U file to play with;
- Save the CoNLL-U file and edit it:
- with any text editor (SublimeText and Atom have plugins for the CoNLL-U format);
- you can use this web page to visulasize (by copy/pasting) your sentence;
- or follow
conllueditor
's installation instructions and useconllueditor
to edit your treebank.
Have fun with UD!
- …