Welcome to the Info Retrieval Wonderland! This project is an exhilarating journey through the realms of web data, index compression, and data analysis. Whether you're a coding wizard or just getting started, there's something magical for everyone! ๐โจ
Explore the web with Python magic! Crawl top 20 pages for queries like "Forests of India" and build an inverted index. Witness the dance of Boolean queries and unveil the power of Tiger AND Safari! ๐ฏ๐
Master the art of intersecting postings with Boolean queries. Get ready for a wildlife adventure with queries like Wildlife AND Poaching. The most restrictive intersections first - because efficiency is key! ๐ฆ๐
Re-index with skip-pointers and witness the speed boost! Run queries 100 times and compare time taken for skip-pointers vs. without. Skip into the future of efficient searching! ๐ฐ๏ธ๐
Embark on a spelling adventure! Create a 3-gram index and correct queries like Tiger AND Saphari. Explore how our correction techniques impact the quality of retrieved documents. ๐ ๐
Level up! Extend the system to perform TF-IDF scoring. Witness the magic of sorted document IDs matching queries. Because scoring adds a touch of enchantment! โ๏ธ๐
Dive into the world of medical abstracts. Create an inverted index and perform TF-IDF scoring. Unravel the secrets of 20 queries and see the space taken by the dictionary! ๐๐ก
Apply dictionary string compression with and without blocking. Witness the evolution of dictionary sizes and query resolution times. Because compression is an art! ๐๏ธ๐ฝ
Analyze the dataset using Python magic! Evaluate inter-annotator agreement, build inverted indices, and compare Elasticsearch performance. It's a data analysis odyssey! ๐๐
Explore pseudo relevance feedback and query expansion. Find the alpha maximizing MAP and witness the impact on IR engine performance. Because relevance is the name of the game! ๐ฎ๐ฌ
Witness the battle of document ranking models - TF-IDF, BM25, Language Model, and LSI. Precision, recall, MAP - the metrics showdown begins! ๐๐
Apply K-Means and hierarchical clustering on chosen documents. Discover the secrets of RSS plots and compare purity/NMI values. Because clustering adds an extra layer of magic! โจ๐
Explore the universe of video game sales with Elasticsearch. Pose questions, extract insights, and unveil the gaming legends. Because every game has a story! ๐๐ฎ
- Clone the repository.
- Navigate to the respective sections you want to explore.
- Follow the instructions in each section's README to unleash the magic.
Explore the wonders of information retrieval and data analysis! Each section is a new adventure, so grab your keyboard and embark on a journey through the code. Happy coding! ๐๐ฎ๐ต๏ธโโ๏ธ