GitHub - theotheo/nlp-tools: Comparison of NLP tools

NLP Tools Comparison

Table

        "fixedHeader": false,
        "scrollX": true,
        "scrollY": '80vh',
        "class": "display"
    });
} );

</script> <style> table.dataTable tbody tr { background: none !important; height: 150px !important; } table.DTFC_Cloned thead, table.DTFC_Cloned tfoot { background: none !important; } </style>

name	Meta	Corpora	Text processing										Annotation		ML					visualization	Multilanguage
	github		Splitting	Parsing	Coreference resolution	Word inflection	Pattern Matching	X-grams	Spelling correction	WordNet	stopwords	statistics	Tagger	NER	Sentiment analysis	Classification	Clustering	Topic Modelling	Vectorization (including embeddings)		Translation	Language Identification
TextBlob			NLTK-tokenizers	based on `pattern`		singularize, pluralize, lemmatize			based on `pattern`	integration		Word and phrase frequencies	1) POS based on `pattern` 2) POS based on NLTK‘s TreeBank tagger 3) NP based on Shlomi Babluki’s implementation 4) NP uses the CoNLL 2000 corpus to train a tagger		PatternAnalyzer (based on the `pattern`) NaiveBayesAnalyzer (an NLTK classifier trained on a movie reviews corpus)	Naive Bayes, Decision Tree					powered by the Google Translate API	powered by the Google Translate API
textacy
pattern		contains API's (Google, Gmail, Bing, Twitter, Facebook, Wikipedia, Wiktionary, DBPedia, Flickr, ...), a robust HTML DOM parser and a web crawler.				yes	by POS-tags						POS (NN, VB, JJ, DT) Chunks (NP)			Naive Bayes, Perceptron, k-NN, SVM	k-means, hierarchical	LSA	td, df, idf, tf-idf, cosine similarity, infogain	graph.js on canvas
pymorphy2						for Russian: singularize, pluralize, lemmatize							for Russian: morphology
PyNLPl
glove																			glove
MITIE			tokenizer											- "bunch of different types of binary relation detector"	yes	yes			pretrained word_feature_extractor
gensim																			tf, tf-idf, word2vec
NLTK								n-grams
stopwords
colibri-core								n-grams, skipgrams, flexgrams
spaCy			- Non-destructive tokenization - Syntax-driven sentence segmentation	"fast and accurate syntactic dependency parser"			Rule-based matching						English and German tagging models with rule-based morphology	> 10 built-in types Stand-off format and token tags training
fastText																yes			skipgram, cbow
SyntaxNet			tokenizer	"transition-based dependency parser"									POS
langid																						pre-trained for 97 languages
CoreNLP			tokenizer	yes	"multi-pass sieve coreference resolution"	lemmatize	Pattern-based entity extraction						POS	- NER with "CRF sequence models" - "Open information extraction"
bllip-parser				"8 known unified parsing models", including models for web, news, PubMed texts
MBSP			Regex-based segmentation Regex-bases tokenization			MBLEM-based lemmatization							POS (NN, JJ, VB) Chunks (NP, VP) Relations (SBJ, OBJ)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets/css		assets/css
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Tools Comparison

Table

About

Releases

Packages

Languages

License

theotheo/nlp-tools

Folders and files

Latest commit

History

Repository files navigation

NLP Tools Comparison

Table

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages