GitHub - bhshri/Information-Retrieval: Various tasks and methods used in Information Retrieval

Preprocessing Text

Cleaning and preprocessing the text is a prerequisite for all the IR and NLP tasks. Cleaning text by removing tags and punctuations, stopword removal, stemming and lemmatization was performed on the text.

TF-IDF

Representation of text in an important step in all the IR and NLP tasks. TF-IDF representation was implemented from scratch on a set of documents and comparison was done with the Sklearn implementation.

Word2Vec Representation

Document Retrieval using SkipGram and CBOW word representation and evaluation using Precision, Recall and F1 score.

LSI

Implementation of LSI on set of documents with the help of SVD and testing Retrieval of documents using cosine similarity measure.

YASS Stemmer

Stemming is implemented using agglomerative clustering using various distance measures for the strings. https://dl.acm.org/doi/10.1145/1281485.1281489

Query Expansion and Relevance feedback

Document retrieval using query was evaluated by performing query expansion(synonyms of query words) and relevance feedback(rocchio algorithm).

Question Answering

Question answering using unsupervised approach using word2vec representation and evaluation using Exact Match and F1 score.

Text Summarization

Extractive text summarization using Texrank and Lexrank and evaluation using ROGUE1 and ROGUE 2 score.

Text Classification

Multiclass text classification using TF-IDF and word2vec representation using SVM.

Text classification using Ensemble based approach

Multiclass text classification using Stacking and voting classifiers. Ensemble of Multinomial Naive Bayes, Logistic Regression and Random Forests.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Latent_semantic_indexing_SVD.ipynb		Latent_semantic_indexing_SVD.ipynb
Question_Answering_word2vec.ipynb		Question_Answering_word2vec.ipynb
README.md		README.md
TFIDF_Vectorizer_implementation.ipynb		TFIDF_Vectorizer_implementation.ipynb
Text_preprocessing_stemming_lemmatization.ipynb		Text_preprocessing_stemming_lemmatization.ipynb
YASS_stemmer_clustering.ipynb		YASS_stemmer_clustering.ipynb
document_classification_TFIDF_word2vec_SVM.ipynb		document_classification_TFIDF_word2vec_SVM.ipynb
document_classification_stacking_voting.ipynb		document_classification_stacking_voting.ipynb
document_retrieval_skipgram_cbow.ipynb		document_retrieval_skipgram_cbow.ipynb
lexrank_texrank_extractive_summarization.ipynb		lexrank_texrank_extractive_summarization.ipynb
query_expansion_relevance_feedback_rocchio.ipynb		query_expansion_relevance_feedback_rocchio.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocessing Text

TF-IDF

Word2Vec Representation

LSI

YASS Stemmer

Query Expansion and Relevance feedback

Question Answering

Text Summarization

Text Classification

Text classification using Ensemble based approach

About

Releases

Packages

Languages

bhshri/Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Preprocessing Text

TF-IDF

Word2Vec Representation

LSI

YASS Stemmer

Query Expansion and Relevance feedback

Question Answering

Text Summarization

Text Classification

Text classification using Ensemble based approach

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages