The goal of the assignment is to build a book recommendation engine that based on the input query, which Simply describe the kind of book we are looking for by specifying book title, author, etc.. and the search engine returns similar 'likes' pulled from the best books ever list of GoodReads
- To this end we have to build our own dataset and the search engine have to run on text documents
- Get the list of books
- Crawl books
- Parse downloaded pages
Create two different Search Engines that, given a query, it pulls a list of books that match the query. For this purpose, nltk library is used
- Conjunctive query
- Conjunctive query & Ranking score
- Build a new metric to rank books based on the queries of the users using a scoring function
- The output, must contain:
- bookTitle
- Plot
- Url
- The similarity score of the documents with respect to the query
- Here the goal is to quantify and visualize the writers' production.
- Given a string written in English capital letters, find the maximum length of a subsequence of characters that is in alphabetical order.
- ADM-HW3.ipynb
- Jupyter notebook script that contains the solutions to the given assignment
- data/ :
- vocabulary.json : vocabulary
- inverted_index_2_1_1.json : simple inverted index
- inverted_index_2_2_1.json : TF-IDF inverted index
- url_list.txt
- precomputed/ : doc_magnitude.json, idf.json
- scripts/ :
- build_tsv.py
- data_collection.py
- index_creation.py
- search_engine.py
- utilities.py
- main_notebook.ipynb