PROBLEM DEFINITION :
Air pollution costs Indian businesses $95 billion or 3% of India’s GDP every year.
These are:
- The opportunity cost of lost worker productivity (e.g., due to higher absenteeism)
- Revenue decrease/lost due to reduced consumer footfall
- The opportunity cost of premature mortality attributed to air pollution
VALUE ADDITION :
- Suggest health impact threshold limits for twelve parameters for which short-term air quality standards are prescribed.
- Suggest qualitative description of air quality and associated likely health impacts for different AQI values.
- Evaluate AQI with data from a few major cities and towns.
- Analysis of available AQIs including international practices.
About DATASET :
Independent Variables: City, Date, PM2.5, PM10, NO, NO2 NOx, NH3, CO, SO2, O3, Benzene, Toluene ,Xylene & Air Quality
Dependent Variables: AQI
Data Preparation:
- Data Type Change
- Five Point Summary
- Null Value detection & Treatment
- Outliers Detection
EXPLORATORY DATA ANALYSIS - Power BI :
- Distribution of parameters
- Covariance
- Industrial Pollution
- Vehicle Pollution
- Pre Corona Analysis
- Post corona analysis
- Power BI -
Models prepared on:
- OLS Model
- Stepwise Regression
- Stochastic Gradient Descent
- Regularization
- Decision Tree Regressor
- Random Forest Regressor
- AdaBoostRegressor
- Tuning with GridSearchCV *Best model deployed with Pickle to reduce time complexity.
ALL MODEL COMPARISON :
CONCLUSION :
-
Delhi was found most polluted city for Vehicular Pollution whereas Ahmedabad for Industrial Emission.
-
AQI for year before 2020 was high, but after 2020 there was significant drop. Probable reason is Covid.
-
AQI is most affected by the pollutant PM2.5 and PM10.
-
For Tuned Random Forest Regressor Model we do not see much underfitting or overfitting condition.
-
Model is capable of predicting the test data with 89% accuracy.
-
RMSE values are similar on train and test dataset, also is less when compared to other models.