-
Notifications
You must be signed in to change notification settings - Fork 2
4 Computational Linguistics
Thursday Oct 29, 16:00 UK = 17:00 CET
Convenors: Alek Keersmaekers (KU Leuven), Marton Ribary (Surrey), Thea Sommerschield (Oxford)
YouTube link: https://youtu.be/zjkyZUpvhAQ
Slides: Combined slides
This session will give a broad overview computation linguistics as applied to ancient languages, and illustrate some of the methods and techniques through a series of short case studies, by the authors of the projects shown. Dr Ribary will discuss the importance of pre-processing texts, and show a specific case study of semantic clustering and vector representation in Latin text. Dr Sommerschield will talk about the concept of machine learning and discuss her own Pythia project on machine-assisted restoration of damaged Greek inscriptions. Dr Keersmaekers will walk through a pipeline to automatically analyze Greek papyrus texts and their specific problems.
[For discussion in this forum thread.]
- Mike Kestermont & Justin A. Stover. 2016. "The Authorship of the Historia Augusta: Two new computational studies." Bulletin of the Institute of Classical Studies 59.2. Pp. 140–157. Available: https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.2041-5370.2016.12043.x
- Marton Ribary & Barbara McGillivray. 2020. “A Corpus Approach to Roman Law Based on Justinian’s Digest.” Informatics 7, 44. Available: https://doi.org/10.3390/informatics7040044
- Yannis Assael, Thea Sommerschield & Jonathan Prag. 2019. “Restoring ancient text using deep learning: a case study on Greek epigraphy.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 6368–6375. Available: https://arxiv.org/abs/1910.06262
- John Bodel. 2012. “Latin Epigraphy and the IT revolution.” Epigraphy and the Historical Sciences, edited by John Davies & John Wilkes. Proceedings of the British Academy 177. Pp. 275–296. Available: https://www.academia.edu/2389348/Latin_Epigraphy_and_the_IT_Revolution_2012
- Giuseppe G. A. Celano, Gregory Crane & Saeed Majidi, 2016. “Part of Speech Tagging for Ancient Greek.” Open Linguistics 1. Available: https://doi.org/10.1515/opli-2016-0020
- Kristina Gulordava. 2018. Word order variation and dependency length minimisation : a cross-linguistic computational approach. Thèse de doctorat : Univ. Genève. Available: https://doi.org/10.13097/archive-ouverte/unige:106855. (Esp. chapter 3, “The DLM principle and word order variability at the language level,” pp. 64-106).
- Alek Keersmaekers. 2020. “Automatic Semantic Role Labeling in Ancient Greek Using Distributional Semantic Modeling.” Proceedings of 1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 59–67. Available: https://www.aclweb.org/anthology/2020.lt4hala-1.9
- Alek Keersmaekers, 2020. “Creating a richly annotated corpus of papyrological Greek: The possibilities of natural language processing approaches to a highly inflected historical language.” Digital Scholarship in the Humanities 35-1, pp. 67–82. DOI: https://doi.org/10.1093/llc/fqz004 (not open access)
- Francesco Mambrini & Marco Passarotti, 2012. “Will a Parser Overtake Achilles? First experiments on parsing the Ancient Greek Dependency Treebank.” _ Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories (TLT11). 30 November – 1 December 2012, Lisbon, Portugal._ Available: https://publicatt.unicatt.it/handle/10807/37956
- Barbara McGillivray. 2013. Methods in Latin Computational Linguistics. Brill. (Not open access.)
- Marton Ribary. 2020. “A Relational Database of Roman Law Based on Justinian’s Digest.” Journal of Open Humanities Data, 6(1), p.5. Available: http://doi.org/10.5334/johd.17
- Pythia notebooks and tutorials
- Linking Latin project (and especially Latin word embeddings)
- Marton's Justinian Digest project, including demo codes and Collab notebook
- Pedalion, Alek's semantic role labeller on Github
- Using one (or more) of the following resources, familiarise yourself with the principles of treebanking in either Greek or Latin.
- Looking at the exercise described in this Treebanking session, create an account on Perseids and attempt to Treebank the first few sentences in either: Phaedrus Fable 1.1, Aesop Fable 38, or a Greek or Latin text of your choice that you know very well. Compare your trees with your teammate·s before bringing the results to class next week.
- If you have any technical problems or questions about Treebanking or Arethusa, you may ask in this forum thread and we or your colleagues may be able to help.