Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 1.33 KB

README.md

File metadata and controls

26 lines (16 loc) · 1.33 KB

apertium2unimorph

Scripts for extracting verbal and nominal inflectional paradigms from Apertium transducers for Turkic languages and converting them to the UniMorph schema. This code was used to generate the UniMorph data for Sakha and Tuvan, which was included in the SIGMORPHON 2021 Shared Task 0. Note: the shared task data was generated using the transducer versions from March 2021.

The scripts currently work only for Tuvan and Sakha but should be relatively straightforward to extend to other Turkic languages represented in Apertium.

Please contact [email protected] for any questions.

Requirements

The corresponding Apertium analyzers must be installed. You can find the installation instructions at the respective repositories:

Other requirements:

  • Python >= 3.6

Usage

To run the extraction and conversion pipeline end-to-end, use:

./run.sh {tyv|sah} path/to/apertium/

where /path/to/apertium/ is the path to the directory one level above the transducer directory (path/to/apertium/apertium-{tyv|sah}).