03_task_description_2.tex

% vim:ft=tex

\section{Task Description (Semester 2)}
The first part of the second semester is dedicated to finish the remaining tasks of the previous semester. This includes finish the Django Web Application as well as implementing an appropriate algorithm to fill the distance database.\\\\
Moreover the actual goal of the second semester is to implement prediction algorithms in order to forecast the usage of rental bikes in London on a daily as well as on a hourly base. Therefore data profiling tasks need to be done in order to investigate the provided data. Furthermore additional features like weather data should be researched and prepared to be added to the feature matrix. At the same time several algorithms need to be tested to find out the best fitting one to our needs. For this purpose two different libraries (Scikit-Learn and Spark) should be analyzed and tested.\\
A big part depicts of the data profiling part where data should be processed for further prediction usage. This includes data cleansing as well as data preparation. Once the data is ready a model can be implemented and first predictions can be made on a daily base. After that, the algorithms need to be evaluated and improved. In order to do so a suitable rating as well as meaningful plots need to be found, to get a better idea of the prediction results. \\
Moreover the same need to be done for the prediction on a hourly base. Which also needs specific data preparation steps.\\
After first predictions the features should be evaluated to gain more knowledge of which features are meaningless and which improve the prediction accuracy. A Further task is to find proper additional features which could also improve the prediction. Therefore research need to be done as well as further data preparation steps and evaluation tasks.\\\\
Another goal is to configure a Hadoop Cluster at University which should provide more computing capacity in order to make predictions out of very large data sets.