Natural Language Processing

This repository is an attempt to implement all algorithms in the Speech and Language Processing, Second Edition book.

Dependencies

The

installations.txt

files contains all the libraries needed for this.

Other than this, you will need to download WordNet and word_tokenize modules using

import nltk
nltk.download()

in a python shell in your terminal.

How to Run ?

The simplest method right now to access each algorithm is to go to each of the folders, if there is a

run.py

file in the folder, you will be able to run it with simply

python run.py <arguments>

you can access what arguments each of those run files takes using the

python run.py -h

option for argparse help.

Done:

Simplified Lesk Algorithm

This algorithm finds the sense of the word by matching context with wordnet definitions and examples to identify a best sense.

Quora Question Pairs

Attempted to solve the Quora Question Pairs competition as part of the class project, so implemented three different basic methods to identify.

Adapted Lesk Algorithm: The Lesk algorithm was a good way to identify which words were being used in what sense and the output of the Lesk algorithm gave a semantic understanding of the sentence. When comparing two sentences, this understanding was used and a similarity score was generated based on the similarity of words in the two sentences.
Cosine Similarity using TFIDF: TFIDF is one of the basic methods to convert sentences to vectors. This is mostly used for numerical representation of words and so seemed a great method to vectorize and identify the similarity between the two sentences.
LSTM using Doc2Vec: Best implementation of the three. Actually this did not require a lot of implementation, just assembling of some methods and training the LSTM for hours over Doc2Vec input and class output.

Context Free Grammar Parser

CFG Parser loads the grammar and tags the sentences using the CFG. The program runs in O(n^2) and is a dynamic program as explained in the Speech and Language Processing book.

Probability Context Free Grammar Parser

This is an extension of the CFG parser where it selects only the most probable tagging of the text.

TODO

Automata - NDRecognise
Automata - DRecognise
Conversational Bot (LSTM)
Brill Tagger

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
automata		automata
converse		converse
lesk		lesk
ngrams		ngrams
parsing		parsing
quora @ b923493		quora @ b923493
speech		speech
utils		utils
viterbi		viterbi
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
_config.yml		_config.yml
installations.txt		installations.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing

Dependencies

How to Run ?

Done:

Simplified Lesk Algorithm

Quora Question Pairs

Context Free Grammar Parser

Probability Context Free Grammar Parser

TODO

About

Releases

Packages

Languages

antiDigest/learning_to_speak

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing

Dependencies

How to Run ?

Done:

Simplified Lesk Algorithm

Quora Question Pairs

Context Free Grammar Parser

Probability Context Free Grammar Parser

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages