Electronic company like Home Depot always want to provide better search experience for their customers. To improve the search engine, people need to give rating to estimate the relevane of the search terms and the search result. In a Kaggle challenge issued by Home Depot (https://www.kaggle.com/c/home-depot-product-search-relevance), kagglers are required to build a model to play as an automatic rater in order to decrease the manual power on this task.
In this project, we formalize this problem as an regression problem in machine learning. Instead of using off-the-shelf machine learning models, we attempt to develop a two-phase framework based on topic modelling to improve the off-the-shelf model and have better accuracy. We used Guassian Processing Regression and Random Forest Regression model for experiments, to study the performance of our frame work, especially in terms of sensitivity of parameters, the accuracy and the improvement due to the use of LDA as topic-model.