Skip to content

Latest commit

 

History

History
105 lines (93 loc) · 6.19 KB

File metadata and controls

105 lines (93 loc) · 6.19 KB

2020/11/02 HOMEWORK 1 - FINAL REPROT

ARCHITECTURE
Architecture


CONTENT


PROBLEM DESPRICTION

Through Prof. Hung-Yi Lee’s OCW (Open Course Ware) and the homework “PM2.5 prediction” learning the knowledge of Linear Regression.

  • Data source
    “環境保護署 2019 年桃園市平鎮區空氣品質監測資料”

  • Choose Model

    • Let "y'" be the prediction PM2.5,
      "x" be the features of 9 hours sensor types (ex. CO, NO, PM10, PM2.5, Rainfalls),
      "w" be the weights of features
    • Function 1: y’ = w * x
    • Function 2: y’ = w1 * x + w2 * x^2
  • Loss function
    RMSE (root-mean-square error)

  • Gradient descent
    Vanilla (basic), SGDM, MBSGD, AdaGrad, RMSProp, Adam

  • Flow chart:
    Linear regression flow chart


HIGHLIGHT

PRE-PROCESSING

  • Using Python rather than excel
  • Invalid value, null value
    filled with the mean of sensor types
  • Split training dataset
    20 days/month
  • Split testing dataset
    Remaining days/month (8~11 days)
  • Data extraction
    Every 9 hours with 15 sensor type/hour be the features and predict the PM2.5 of 10th hour

ADJUST LEARNING RATE

plot of adjusting learing rate

  • Model: Power of one
  • Gradient: Vanilla g radient descent
  • Iteration: 1,000 times

FEATURE SELECTION

plot of features selection

  • Training set number: 4,521
  • Validation set number: 1,131
  • Iteration: 10,000 times
  • Gradient: AdaGrad

FEATURE SCALING

plot of features scaling

  • Z-Score (Standard Score): (x – μ) / σ
  • Max-Min: x - min(x) / max(x) - min(x)

PERFORMANCE IMPORVEMENT

plot of IMPORVEMENT

  • Training set number: 5,652
  • Testing set number: 1,446
  • Iteration: 150,000 times

LISTING


TEST AND RUN

Demo code running

Demo

Comparison of gradient descent

plot of gradient comparison


DISCUSSION

  1. What's happened to adjusting learning rate?
    • If learning rate is not minimal enough, loss would not converge (even become greater than greater)
    • If learning rate is too small, loss convergence would be inefficient
    • Only learning rate just fit, loss convergence would be efficient
  2. Why removing some features would improve the performance?
    • Maybe the other features effect on PM2.5 be not that fast
  3. Why Z-Score is better in performance of convergence?
    • Z-Score is appropriate in unknowing maximum and minimum
  4. How to improve model?
    1. Shuffle training and testing data
    2. Using AdaGrad rather than Vanilla gradient descent
    3. Feature only select PM10 and PM2.5
  5. Final improvement of model:
    converge loss value descents 0.0225