Ìrànlọ́wọ́

Ìrànlọ́wọ́ is a set of utilities to analyze & process Yorùbá text for NLP tasks. The focus is on helping software developers build large, clean text datasets for (further) diacritic restoration and machine translation tasks.

Features

ADR tools

Strip all diacritics from word-types
Verify that text is NFC or NFD
Normalize a corpus (from MS Word or elsewhere) → NFC
Split long sentences on certain characters like ;,:, etc
Automatically restore correct diacritics using a pre-trained model
Find all variants of all word-type in a given corpus
Partially strip diacritics from word-types

Ready to use webpage scrapers

Bíbélì Mímọ́ (Biblica, Bible Society of Nigeria)
Yorùbá Blog
BBC Yorùbá

Corpus analysis tools

Dataset character distribution
Dataset ambuiguity statistics → Lexdif, etc for a given corpus
Dataset scoring (proximity to correctly diacritized text, LM perplexity, KL divergence)

Installation

Obtainable from the Python Package Index (PyPI) → pip install iranlowo

Example

Show computing environment and installation process

Diacritize a phrase

$ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import iranlowo.adr as ránlọ
>>> ránlọ.diacritize_text("lootoo ni pe ojo gbogbo ni ti ole")
PRED AVG SCORE: -0.0037, PRED PPL: 1.0037
'lóòtóọ́ ni pé ọjọ́ gbogbo ni ti olè'

Diacritize phrases, note we use ipython only because it renders nicer, easy-to-read text-colours in the terminal!

Disclaimer

This is beta software, if you pass the diacritizer out-of-domain text, English, pidgin or any other non-Yorùbá text, you will experience very marvelous, black-box results.

Since this a work-in-progress and we are steadily improving, if you encounter any problems with correctness or performance, please submit pull-requests with corrections or file an issue.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
classifiers.txt		classifiers.txt
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ìrànlọ́wọ́

Features

ADR tools

Ready to use webpage scrapers

Corpus analysis tools

Installation

Example

Disclaimer

License

About

Releases

Packages

Contributors 2

Languages

License

Niger-Volta-LTI/iranlowo

Folders and files

Latest commit

History

Repository files navigation

Ìrànlọ́wọ́

Features

ADR tools

Ready to use webpage scrapers

Corpus analysis tools

Installation

Example

Disclaimer

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages