ELIT (Emory Information and Language Technology) features an English tokenizer that splits text into a sequence of tokens and segment them into sentences using lexicon-based heuristics. This project is led by the Emory NLP Research Laboratory and under the Apache 2.0 license.
- Latest release: 1.0 (10/15/2021)
Python 3.7 or higher is recommended:
pip install elit_tokenizer