LDA Numpy Implementation with Baby Example

This is a tiny LDA implementation from Boyuan Chen and Yi Wei. The goal is to help first-time topic-modeling learners to acquire the knowledge faster. The corpus looks like:

X = np.array([
    [0, 0, 1, 2, 2],
    [0, 0, 1, 1, 1],
    [0, 1, 2, 2, 2],
    [2, 2, 1, 1, 4],
    [4, 4, 4, 4, 4],
    [3, 3, 4, 4, 4],
    [3, 4, 4, 4, 4],
    [3, 3, 3, 4, 1],
    [4, 4, 3, 3, 2],
])

Apparently, the first four documents should be in one topic, and the other five should be in the other topic. The goal is to output a document-topic probability distribution like below:

[[0.14285714 0.85714286]
 [0.14285714 0.85714286]
 [0.28571429 0.71428571]
 [0.42857143 0.57142857]
 [0.85714286 0.14285714]
 [0.71428571 0.28571429]
 [0.85714286 0.14285714]
 [0.85714286 0.14285714]
 [0.71428571 0.28571429]]

We mainly took reference from from Agustinus Kristiadi's Blog: https://agustinus.kristia.de/techblog/2017/09/07/lda-gibbs/. All the parameters correspond to Finding scientific topics 2004.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Bag_of_Words		Bag_of_Words
Plain		Plain
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDA Numpy Implementation with Baby Example

About

Releases

Packages

Languages

BoyuanJackChen/LDA_Numpy_Implementation

Folders and files

Latest commit

History

Repository files navigation

LDA Numpy Implementation with Baby Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages