This repository contains a set of machine learning models to forecast the pollutants in in the Metropolitan Area of Mexico City. The models are optimized to obtain a low false positive rate according to the levels of the environmental contingency program.
Models were developed to forecast pollution levels in Mexico City, the pollutants predicted are the following:
Para cada contaminante se desarrollaron modelos para pronosticar sus niveles con hasta 24 horas de antelación, se obtuvo un error comparable a la bibliografía.
- PM10
- PM2.5 (in development)
- Ozone
There is a dashboard of the project, developed in the Repositories, Research and Prospective Coordination (CRIP) of the National Council of Science and Technology (CONACyT).
The aim of the dashboard is to inform the population of the Valley of Mexico in a friendly and direct way about the state of air quality in it. It consists of a dashboard that shows the current status of the air quality index, and is updated hourly. The index is obtained from the data shared by the Ministry of Environment (SEDEMA) of the Government of Mexico City and can be found here. Also using machine learning algorithms, a model that estimates the air quality index 24 hours ahead was built. The table shows this estimate as well as a line graph of the hour-to-hour estimate of the index of suspended particles less than 10 micrometers (PM10) and ozone (O3).
Pollution and meteorological data are obtained from the CDMX air quality portal.
For each pollutant models were developed to forecast their levels up to 24 hours in advance, an error comparable to the literature was obtained.
This repository contains a set of machine learning models to forecast the pollutants in in the Metropolitan Area of Mexico City. The models are optimized to obtain a low false positive rate according to the levels of the environmental contingency program.
Models were developed to forecast pollution levels in Mexico City, the pollutants predicted are the following:
- PM10 - PM2.5 (in development) - Ozone
Pollution and meteorological data are obtained from the CDMX air quality portal.
For each pollutant models were developed to forecast their levels up to 24 hours in advance, an error comparable to the literature was obtained.
The following graph shows the actual and predicted values 12 hours in advance for PM10:
- Paulina Pradel in the visualization and web dashboard section. The following graph shows the actual and predicted values 12 hours in advance for the Ozone:
- PM10 (24 hours Moving average):
The mean RMSE is about 11.59%, the next graph shows the RSME by hour:
For more info about the performance of the models, don't hesitate to contact me.
- Paulina Pradel visualization and web dashboard.
- Daniel Bustillos data analysis and modelling.
- Inferential Statistics
- Machine Learning
- Data Visualization
- Predictive Modeling
- Python
- Plotly
- PostGres
- Pandas, jupyter
- HTML
(Provide more detailed overview of the project. Talk a bit about your data sources and what questions and hypothesis you are exploring. What specific data analysis/visualization and modelling work are you using to solve the problem? What blockers and challenges are you facing? Feel free to number or bullet point things here)
-
Clone this repo (for help see this tutorial).
-
Raw Data is being kept [here](Repo folder containing raw data) within this repo.
If using offline data mention that and how they may obtain the data from the froup)
-
Data processing/transformation scripts are being kept [here](Repo folder containing data processing scripts/notebooks)
-
Follow setup [instructions](Link to file)
- Python
- Scikit
- Plotly
- PostgreSQL
- Jupyter
- HTML
If you want to access the forecast it is suggested to visit the dashboard directly (soon). If you need to compute the forecast, it is enough to follow the following steps:
-
Clone this repo (for help see this tutorial).
-
Raw Data is being kept here within this repo.
-
The forecast and data processing/transformation scripts are implemented in a data pipeline, to run it, simply run in a terminal:
python pipeline_general/pipeline/4_predicción.ipynb
![tablero de calidad del aire](assets/tablero_scr.png
- If you haven't joined the SF Brigade Slack, you can do that here.
- Our slack channel is
#datasci-projectname
======= Team Leads (Contacts) : Juan Daniel Bustillos Camargo([email protected])
Name | Role |
---|---|
Norberto Morales | Data Engineer |
- Feel free to contact team leads with any questions or if you are interested in contributing!