-
Notifications
You must be signed in to change notification settings - Fork 4
Personalized Search Project
nicolaasmatthijs/AlterEgo
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
// Nicolaas Matthijs - nm417 - CSTIT // AlterEgo :: Personalized Web Search ----------------------------------- This code is the result of a research project conducted in Easter Term 2010 in order to obtain the MPhil Computer Speech, Text and Internet Technology at the University of Cambridge. It investigates the use of profile based personalized web search using browsing history and thorough offline and online evaluation in a complete and self-contained research project. It involves the full research cycle and includes pre-research investigation, suggesting and implementing a new personalization strategy, capturing data, finding participants, two different evaluations, parameter investigation, comparison to previous best systems and making the developed Firefox add-on available for real-life usage. This README file first gives an overview of all of the files available in this repository (http://github.com/nicolaasmatthijs/AlterEgo). Next, a manual is provided that explains the different steps required to get the project up and running. All of the code is highly configurable and extensively documented. /////////////////// // File Overview // /////////////////// ***************************** ** 01_ProjectSpecification ** ***************************** ++ Project Proposal Original project proposal as submitted before the start of the project ++ Workplan Workplan outlining the proposed ideas, goals and timeline ********************* ** 02_FirefoxAddOn ** ********************* ++ Add-On Code for the AlterEgo Firefox add-on. ++ DataStorage Scripts that are responsible for saving a user's browsing history into a database. ++ Website Files for the website that was used to present the project and recruit people to participate in the experiments. ****************************** ** 03_PersonalizationServer ** ****************************** ++ Data Extraction All script necessairy for processing a user's browsing history and extracting some of the structure into a large XML user document. ++ Term Extraction Re-implemenation of the Vu term extraction algorithm using the web.py python. webserver ++ JAR Files List of JAR files used by the AlterEgo personalization server. ++ Personalization Main project output. This server, written in Java, does data processing, profile generation, searching, result re-ranking, evaluation and evaluation processing. This code also includes the MIT WordNet API. ********************* ** 04_3rdPartyCode ** ********************* This folder contains references to the different 3rd party tools that were used for the implementation of this project. ++ ccparser The Clark & Currant statistical natural language parser is used to parse the text present on web pages in order to extract noun phrases. ++ google-n-gram-corpus The Google N-Gram corpus is used to estimate the number of occurences of an N-Gram on the web. ++ memcached-1.4.5 ++ redis-1.2.6 Redis is a caching framework built on top of MemCacheD that is used to store the Google N-Gram corpus in memory and perform extremely fast retrieval of entries from the corpus. ++ opennlp-tools-1.3.0 The OpenNLP Tools are used for sentence splitting and tokenization to prepare text for parsing. ++ wordnet-3.0 The lexical database WordNet is used to filter a list of terms on Part of Speech. *********************** ** 05_DataAndResults ** *********************** ++ Usage Data Summary of the data collected by the Firefox add-on. ++ Relevance Judgements All documents related to the relevance judgements evaluation. This contains the document that was given to the participants in the offline relevance judgements sessions and the processed results coming out of that. ++ Interleaving Contains all of the results coming out of the online interleaved evaluation session. ***************************** ** 06_ProjectDocumentation ** ***************************** This folder contains of all of the files that try to document the overall project and its outcomes. ++ Presentations Powerpoint presentations that were used for intermediary project presentations ++ Thesis The Masters thesis that was produced as output of this project ++ Conference Paper LaTeX files that contain the first draft of the conference paper extracted out of the produced paper. This paper, once finished, will be submitted to the WSDM 2011 Conference in Hong Kong. ///////////////////////// // Installation Manual // ///////////////////////// *************************** ** Checking out the code ** *************************** 1) Install GIT (http://git-scm.com/) 2) Check out the Personalized Search code from [email protected]:nicolaasmatthijs/AlterEgo.git ******************************** ** Installing 3rd Party Tools ** ******************************** 3) Install Memcached (see 04_3rdPartyCode/memcached-1.4.5) 4) Install the Redis caching server (see 04_3rdPartyCode/redis-1.2.6) 5) Start the Redis server using redis.sh 6) Install the Clark & Curran Statistical Natural Language Parser (see 04_3rdPartyCode/ccparser) 7) Download the models file required for the Clark & Curran Statistical Natural Language Parser 8) Download the latest version of the lexical database WordNet (see 04_3rdPartyCode/wordnet-3.0) ******************** ** Firefox add-on ** ******************** 9) Install MySQL 10) Put the files found in 02_FirefoxAddOn/Website on an Apache server 11) Update the configuration parameters in 02_FirefoxAddOn/Website/config.php 12) Put the files found in 02_FirefoxAddOn/DataStorage on an Apache server 13) Update the configuration parameters in 02_FirefoxAddOn/DataStorage/config.php ********************* ** Data processing ** ********************* 14) Put the files found in 03_PersonalizationServer/Data Extraction onto an Apache server 15) Put the term extraction script found in 03_PersonalizationServer/Term Extraction on a server that support python 16) Start the term extraction script by calling Term Extraction/web.py *********************** ** Re-ranking server ** *********************** 17) Install the main personalization server found at 03_PersonalizationServer/Personalization on a Tomcat 6 web server 18) Add the JAR files found in 03_PersonalizationServer/JAR Files to the TomCat shared/lib folder 19) Update the configuration file at AlterEgoServer/build/src/org/cl/nm417/config/config.properties to comply with the local installation 20) Start the Personalization server 21) The following pages represent the main functionality * index.html Main personalization User Interface * indexEvaluate.html Relevance judgements experiment User Interface * generateProfiles.jsp Generate all profile for the Relevance judgements experiment * results.jsp Relevance judgements experiment result processing * profilegenerator.jsp Script used to continously keep user profiles up to date * interleavingresults.jsp Process the results from the online interleaved evaluation
About
Personalized Search Project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published