Through Prof. Hung-Yi Lee’s OCW (Open Course Ware) and the homework “PM2.5 prediction” learning the knowledge of Linear Regression.
-
Data source
“環境保護署 2019 年桃園市平鎮區空氣品質監測資料” -
Choose Model
- Let "y'" be the prediction PM2.5,
"x" be the features of 9 hours sensor types (ex. CO, NO, PM10, PM2.5, Rainfalls),
"w" be the weights of features - Function 1: y’ = w * x
- Function 2: y’ = w1 * x + w2 * x^2
- Let "y'" be the prediction PM2.5,
-
Loss function
RMSE (root-mean-square error) -
Gradient descent
Vanilla (basic), SGDM, MBSGD, AdaGrad, RMSProp, Adam
- Using Python rather than excel
- Invalid value, null value
filled with the mean of sensor types - Split training dataset
20 days/month - Split testing dataset
Remaining days/month (8~11 days) - Data extraction
Every 9 hours with 15 sensor type/hour be the features and predict the PM2.5 of 10th hour
- Model: Power of one
- Gradient: Vanilla g radient descent
- Iteration: 1,000 times
- Training set number: 4,521
- Validation set number: 1,131
- Iteration: 10,000 times
- Gradient: AdaGrad
- Z-Score (Standard Score): (x – μ) / σ
- Max-Min: x - min(x) / max(x) - min(x)
- Training set number: 5,652
- Testing set number: 1,446
- Iteration: 150,000 times
- DataPreprocession.py: Data cleaning, Dataset spliting
- TestDataPreprocessing.py: Feature extraction, Shuffle
- LinearRegression.py: Feature extraction, Shuffle, Normalization,Split validation set, Training, Prediction
- ReDesignModel_Seq.py: Same as LinearRegression.py but use "square function model"
- getHT.py: Raspberry Pi reads temperature and humidity datas from DHT11
- CSVDownloader_GoogleSheet.py: Automatically download datas frominternet google sheet
- getUbidotsData.gs: Get datas from Ubidots to Google sheet, "gs"means "Google Apps Script"
- What's happened to adjusting learning rate?
- If learning rate is not minimal enough, loss would not converge (even become greater than greater)
- If learning rate is too small, loss convergence would be inefficient
- Only learning rate just fit, loss convergence would be efficient
- If learning rate is not minimal enough, loss would not converge (even become greater than greater)
- Why removing some features would improve the performance?
- Maybe the other features effect on PM2.5 be not that fast
- Why Z-Score is better in performance of convergence?
- Z-Score is appropriate in unknowing maximum and minimum
- How to improve model?
- Shuffle training and testing data
- Using AdaGrad rather than Vanilla gradient descent
- Feature only select PM10 and PM2.5
- Final improvement of model:
converge loss value descents 0.0225