Entropy-of-Romanian-language

Calculus of the entropy of Romanian language

Starting from the research done by Shannon, we note the work "Prediction and Entropy of Printed English", in which he presents 2 methods for estimating the entropy of the English language:

"N-grams: the entropy of the next letter can be calculated statistically when N-1 previous letters are known."
“Shannon calculates the entropy of each word in English and takes a weighted average. Shannon uses an approximation function to estimate entropy for over 8,000 words. The calculated value you get for entropy per word is 11.82 bits, and since the average word has 4.5 letters, the entropy is 2.62 bits per letter. "

Thus, to calculate the entropy of the Romanian language, we chose the FIRST method of solving presented by Shannon, so we follow the following steps to solve the problem:

Choosing a representative text for the Romanian language and its processing: (I took as an example: “The most beloved of earthlings” -by Marin Preda because it is a work close to the 21st century, using various lexical fields)
Finding the probability of the letter gr + 1, when we know the first gr, and calculating the entropy
Calculation of entropy using the formula: ∑ (frequency of occurrence of a group of letters gr / no) * (probability of occurrence of the next character after the group gr) * log2 (1 / probability of occurrence of the next character after the group gr)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
entropie.txt		entropie.txt
entropie_F0.py		entropie_F0.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entropy-of-Romanian-language

About

Releases

Packages

Languages

mirunaavram/Entropy-of-Romanian-language

Folders and files

Latest commit

History

Repository files navigation

Entropy-of-Romanian-language

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages