Application of topic model with special focus on German texts.
Datasets:
- German Political Speeches
TODO
Offenes ParlamentTODO
Project GutenbergTODO
German news articlesTODO
German Wikipedia articles
Algorithms:
TODO
LSI - Latent Semantic Indexing (SVD)- LDA - Latent Dirichlet Allocation
TODO
NMF - Non-negative Matrix Factorization
Tools:
- Gensim
- Mallet
TODO
ldaTODO
NLTKTODO
sklearnTODO
BigARTMTODO
Vowpal Wabbit (Online LDA)TODO
tmtoolkitTODO
tcma
About: Building, Evaluating, Visualizing Topic Models
- Gensim Tutorials
- Topics and Transformations
- Tutorial on Mallet in Python (2014-03-20)
- Mallet
- pyLDAvis Library
- Machine Learning Plus Tutorials (Topic Modeling, NLP)
- Topic modeling visualization – How to present the results of LDA models? (2018-12-04)
- LDA in Python – How to grid search best topic models? (2018-04-04)
- Topic Modeling with Gensim (2018-03-26)
- Lemmatization Approaches with Examples in Python (2018-10-02)
- Gensim Tutorial
- Data Science Plus Tutorials
- Topic Modeling in Python with NLTK and Gensim (2018-04-26)
- Evaluation of Topic Modeling: Topic Coherence (2018-05-03)
- Towards Data Science
- WZB Data Science Blog (NLP)
- https://radimrehurek.com/gensim/wiki.html
- https://www.kdnuggets.com/2017/11/building-wikipedia-text-corpus-nlp.html
- Link List - Wissenschaftszentrum Berlin für Sozialforschung
- Link List - Institut für deutsche Sprache und Linguistik (HU Berlin)
- POLLUX - Informationsdienst Politikwissenschaft
- German Microdata Lab (gesis)
- Leipzig Corpora Collection
- DWDS Corpora
LDA
- David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent Dirichlet Allocation. In: Journal of Machine Learning Research, 2003
Sentiment
- R. Remus, U. Quasthoff & G. Heyer: SentiWS - a Publicly Available German-language Resource for Sentiment Analysis. In: Proceedings of the 7th International Language Ressources and Evaluation (LREC'10), pp. 1168-1171, 2010