Skip to content

Морфологический анализатор на основе нейронных сетей и pymorphy2

License

Notifications You must be signed in to change notification settings

bond005/rnnmorph

 
 

Repository files navigation

rnnmorph

Current version on PyPI Python versions Build Status Code Climate

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
Lenta (news) 96.31% 98.01% 92.96% 77.93% 52.79%
VK (social) 95.20% 98.04% 92.06% 74.30% 60.56%
JZ (lit.) 95.87% 98.71% 90.45% 73.10% 43.15%
All 95.81% 98.26% N/A 74.92% N/A

English language, UD EWT test, accuracy

Dataset Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
UD EWT test 91.57% 94.10% 87.02% 63.17% 50.99%

Speed and memory consumption

Скорость: от 200 до 600 слов в секунду на CPU, на GPU в несколько раз быстрее.

Потребление оперативной памяти: зависит от режима работы, для предсказания одиночных предложений - 500-600 Мб, для режима с батчами - пропорционально размеру батча.

Install

sudo pip3 install rnnmorph

Usage

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Acknowledgements

About

Морфологический анализатор на основе нейронных сетей и pymorphy2

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%