The project code is under the Twitter-Information-Retrieval-Project directory
The report is under Paper directory
###To run this code the following libs are needed:
- stanford-corenlp-3.7.0-models.jar
This project comes with two data example for the indexes and the twittes. They are tweetsEN and tweetsEN20 for the tweets and Indexes and Indexes20 for the indexes
###There are test classes for all the main modules as follows:
- TestTokenizer.java
- TestStopWordsRemover.java
- TestEnglishLemmatisation.java
- TestPreprocessEnglish.java
- TestVocabulary.java
- TestCrawler.java
- TestIndexer.java
- TestSearch.java
Link to the todo list