Yorùbá text

This repository contains fully diacritized Yorùbá text, converted to Unicode Normalization Form Composition (NFC) format, where diacritized characters are composed into a single character with the following code:

def convert_to_NFC(filename, outfilename):
    text=''.join(c for c in unicodedata.normalize('NFC', open(filename).read()))
    with open(outfilename, 'w') as f:
        f.write(text)

Sources:

Sources yet to be scraped and cleaned

Social Media sources:

Text has been gathered with permission from online sources, and lightly preprocessed for use in NLP, TTS, ASR applications. Note, some of the sentences may have errors, please submit a pull-request if you have corrections!

Resources

Bibtex

If you want to cite this repo in your work, please use:

@misc{Orife_yoruba-text_2018,
author = {Orife, Iroro and Fasubaa, Timilehin and Wahab, Olamilekan},
month = {1},
title = {{yoruba-text}},
url = {https://github.com/Niger-Volta-LTI/yoruba-text},
year = {2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
Alabi_YorubaTwi_Embedding		Alabi_YorubaTwi_Embedding
Asubiaro_LangID		Asubiaro_LangID
Bibeli_Mimo		Bibeli_Mimo
Book_of_Mormon		Book_of_Mormon
Iroyin		Iroyin
JW300		JW300
LagosNWU		LagosNWU
Lesika		Lesika
OCR_Text		OCR_Text
Owe		Owe
Quran_Mimo		Quran_Mimo
SLR86		SLR86
TheYorubaBlog		TheYorubaBlog
Universal_Declaration_Human_Rights		Universal_Declaration_Human_Rights
YorubaForAcademicPurpose		YorubaForAcademicPurpose
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
Synchro_System_Movement.txt		Synchro_System_Movement.txt
dataset_scorer.py		dataset_scorer.py
test_yoruba_diacritic_removal.py		test_yoruba_diacritic_removal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yorùbá text

Sources:

Sources yet to be scraped and cleaned

Social Media sources:

Resources

Bibtex

About

Releases

Packages

Contributors 4

Languages

License

Niger-Volta-LTI/yoruba-text

Folders and files

Latest commit

History

Repository files navigation

Yorùbá text

Sources:

Sources yet to be scraped and cleaned

Social Media sources:

Resources

Bibtex

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages