markov-next-word

Next word prediction using markov chains no numpy (using numpy only for random choices), no pytorch, no neural networks, no lstm, no transformers and so on.

Just 60 lines of raw python and a little bit of math, enjoy ;)

Generated text using the model (trained with sakespeare)

Input:  love
Output: love to my self to my friend and in thy heart

Input:  sweet
Output: sweet love that i have seen the world will bear thine 

Input:  i
Output: i am not the world s best

Usage (with all of shakespeare -> generate shakespeare like text)

Import and create a model.

from src.markov_next_word import MarkovNextWord
NextWordPrediction = MarkovNextWord()

Train the model with your text data.

# Replace data/shakespeare.txt with your data.
NextWordPrediction.train('data/shakespeare.txt')

Generate text

# Generate text after love.
input_text = 'love'

# Num of words to generate.
sequence_length = 10
NextWordPrediction.generate_text(input_text, sequence_length)

Generated output

love to my self to my friend and in thy heart

What is a markov property?

In a process where the next state depends only on the current state, this property is called markov property. A sequence of events which follow the Markov property is referred to as the Markov Chain.

Short example

Lets define 3 states:

1: 'sun shines' 2: 'cloudy' 3: 'raining'

Define follwoing rules:

Rule 1: Sunny → next day: 80% rain, 20% cloudy, 0% sun.
Rule 2: Cloudy → next day: 50% rain, 50% sun, 0% cloudy.
Rule 3: Rainy → next day: 90% cloudy, 10% rain, 0% rain.

The stochastic matrix would look like this:

        sun, cloudy, rain
sun     [[0, 0.2, 0.8],
cloudy  [0.5, 0, 0.5],
rain    [0.1, 0.9, 0]]

Weather simulation

Today the sun is shining, predicting the weather in the next 7 days as follow.

import numpy as np
from numpy.linalg import matrix_power

p = np.array([[0, 0.2, 0.8],
              [0.5, 0, 0.5],
              [0.1, 0.9, 0]])

# For example, we have sunday and the sun shines today.
v_start = np.array([1, 0, 0])

# Simulate 7 days.
# Day1 = Monday, ..., Day7 = Sunday
for day in range(1, 8):
    v = np.dot(v_start, matrix_power(p, day))
    prop_sun, prop_cloudy, prop_rain = round(100 * v[0], 2), round(100 * v[1], 2), round(100 * v[2], 2)
    print(f"Day: {day}, sun shines = {prop_sun}%, cloudy = {prop_cloudy}%, rain = {prop_rain}%")

Day: 1, sun shines: 0.0%, cloudy: 20.0%, rain: 80.0%
Day: 2, sun shines: 18.0%, cloudy: 72.0%, rain: 10.0%
Day: 3, sun shines: 37.0%, cloudy: 12.6%, rain: 50.4%
Day: 4, sun shines: 11.34%, cloudy: 52.76%, rain: 35.9%
Day: 5, sun shines: 29.97%, cloudy: 34.58%, rain: 35.45%
Day: 6, sun shines: 20.83%, cloudy: 37.9%, rain: 41.27%
Day: 7, sun shines: 23.08%, cloudy: 41.31%, rain: 35.62%

Next word prediction using markov property

Consider following example data

I like Math.
I like Physics.
I hate War.
I love Schnitzel.
I love Science.

Mapping words to next words.

i -> [like, like, hate, love, love]
like -> [math, physics]
hate -> [war]
love -> [schnitzel, science]

Graph representation

Mapping word and next_word to its probability.

(i, like) -> 0.4
(i, hate) -> 0.2
(i, love) -> 0.4
(like, math) -> 0.5
(like, physics) -> 0.5
(hate, war) -> 1.0
(love, schnitzel) -> 0.5
(love, science) -> 0.5

Graph representation with probabilitys

Example using the model

>>> from src/markov_next_word import MarkovNextWord
>>> mnw = MarkovNextWord()
>>> mnw.train('data/test.txt')
>>> mnw.word_to_nextwords
{'i': ['like', 'like', 'hate', 'love', 'love', 'love'], 'like': ['math', 'physics'], 'hate': ['war'], 'love': ['schnitzel', 'science']}
>>> mnw.word_to_next_word_prob
{('i', 'like'): 0.4, ('i', 'hate'): 0.2, ('i', 'love'): 0.4, ('like', 'math'): 0.5, ('like', 'physics'): 0.5, ('hate', 'war'): 1.0, ('love', 'schnitzel'): 0.5, ('love', 'science'): 0.5}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
res		res
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markov-next-word

Generated text using the model (trained with sakespeare)

Usage (with all of shakespeare -> generate shakespeare like text)

Import and create a model.

Train the model with your text data.

Generate text

Generated output

What is a markov property?

Short example

Lets define 3 states:

Define follwoing rules:

The stochastic matrix would look like this:

Weather simulation

Today the sun is shining, predicting the weather in the next 7 days as follow.

Next word prediction using markov property

Consider following example data

Mapping words to next words.

Graph representation

Mapping word and next_word to its probability.

Graph representation with probabilitys

Example using the model

About

Releases

Packages

Languages

License

raphsenn/markov-next-word

Folders and files

Latest commit

History

Repository files navigation

markov-next-word

Generated text using the model (trained with sakespeare)

Usage (with all of shakespeare -> generate shakespeare like text)

Import and create a model.

Train the model with your text data.

Generate text

Generated output

What is a markov property?

Short example

Lets define 3 states:

Define follwoing rules:

The stochastic matrix would look like this:

Weather simulation

Today the sun is shining, predicting the weather in the next 7 days as follow.

Next word prediction using markov property

Consider following example data

Mapping words to next words.

Graph representation

Mapping word and next_word to its probability.

Graph representation with probabilitys

Example using the model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages