The Open Medieval French (OpenMedFr) initiative aims to publish open, plain text versions of works written over four centuries of Medieval French writing. The main goal of the initiative in its early stages is to expand digital textual studies in medieval French through the creation of open corpora and a community of both users and producers of textual data.
With such plain text versions at their disposal, researchers can use the texts within the scope of their advanced research questions (digital editions in TEI XML, text mining, NLP, alignment, entity extraction, etc.).
We work with digitized copies of a public domain books, process them by optical character recognition (OCR) and correct and annotate them by hand. In this first phase, the texts will be high quality, but they will not be 100% error free.
One place we might begin is with the Société des anciens textes français, a series which contains some 90+ texts published before the 1920s.
Some candidates from pre-1925 SATF volumes for inclusion:
- Trois versions rimées de l'Évangile de Nicodème (1885, 1889) - in process
- La fille du comte de Pontieu 13th and 15th versions (1922) - in process
- Le roman de Thèbes in 2 vols (1890) - in process
- Benoît de Sainte-Maure, Le roman de Troie in 6 vols (1909) - in process
- L'escoufle (1894)
- Jean de Meun, L'art de chevalerie (1897)
- Jean Priorat, L'abrejance de l'ordre de chevalerie (1897)
- Florence de Rome in 2 vols (1907, 1909)
- Merlin in 2 vols (1886)
- Gervais de Bus, Le roman de Fauvel in 2 vols (1914, 1919)
- Jean Froissart, Meliador in 3 vols (1895)
- Florence de Rome in 2 vols (1907, 1909)
- Miracles de Nostre Dame par personnages in 8 vols (1876-83)
Several authorial oeuvres are available for inclusion as well:
- Adenet le roi Buevon de Conmarchis; Enfances Ogier; Berte aus grans piés; Cleomadés vol 1; Cleomades vol 2
- Guillaume de Machaut vol 1; vol 2; vol 3
- Philippe de Rémi, Oeuvres poétiques
There are also many other places to find public domain digitized editions: Google Books, Gallica, Internet Archive and HathiTrust. In addition, early Romance journals such as Romania and Romanische Forschungen are important sources as well.
There are also existing corpora of works in medieval French, BMF, LFA (Ottawa), Nouveau Corpus d'Amsterdam, SRCMF, Classiques Garnier, ARTFL. The Classical Language ToolKit has assembled some materials from Wikisource. We are compiling a list of what these corpora already offer and will publish it here.
OpenMedFr is inspired by the OpenPhilology movement, including Open Greek and Latin, the Open Islamic Texts Initiative and EEBO TCP.
If you are interested in contributing to this digital text creation project, or have found errors you would like to point out, please contact openmedfr (at) gmail (dot) com.