Web Logs Data Unsupervised, Supervised Learning, Association Rule Mining & ARIMA Prediction. Web Crawling of citation information from Google Scholar.
Part I - Data Analytics — Web Log Data
- Data ETL
- Load Data
- Feature Selection
- Unsupervised learning
- Supervised learning
- Data Preparation
- Logistic Regression
- K-fold Cross Validation
- Association Rule Mining
Part II - Web Crawling
- Crawl the professor Gang Li citation information from 2003 to 2021
- Train Arima to predict the 2018 to 2020 citation
- Train Arima Model
- Predicting the citation and Calculate the RMSE
- Draw the visualization to compare
- Conduct the Grid Search with parameter selection and then predict the 2021 and 2022
- Grid Search
- Select the best parameter values and Predict for 2021 and 2022