This project is part of the "Cloud Computing" course (RSO) and focuses on Machine Learning in the cloud for weather prediction.
The project includes the following components:
: Apache Airflow DAGs for ETL (Extract, Transform, Load), evaluation, and
: ETL DAG for data
: Evaluation DAG for model
: Training DAG for model training.
: Data files obtained from the Open-Meteo weather API. -
: Trained machine learning models in.h5
extension indicates the best-performing model. -
: A Jupyter Notebook containing code for fetching, processing, and loading data into BigQuery from the Open-Meteo API. -
: A Jupyter Notebook that includes exploratory data analysis and training different types of machine learning models (naive, autoregressor, neural network).
To set up Apache Airflow for this project, follow these steps:
Create a virtual environment and activate it:
python3 -m venv airflow-venv source airflow-venv/bin/activate
Install the required Python packages from
:pip install -r requirements.txt
Initialize the Airflow database:
airflow db init
Create an admin user for Airflow:
airflow users create \ --username admin \ --firstname FIRST_NAME \ --lastname LAST_NAME \ --role Admin \ --email [email protected]
Set the Google Cloud service account key file path as an environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
Start the Airflow webserver on port 8080:
airflow webserver -p 8080
Start the Airflow scheduler:
airflow scheduler -S .
If needed, you can forcefully kill the Airflow webserver and its processes on port 8081:
sudo fuser -k 8081/tcp
Important: Before running the DAGs, set the necessary Airflow variables via the Airflow Web UI. Specifically, add the following variable:
gcp_info = { "project_id": "balmy-apogee-404909", "bucket_name": "europe-central2-rso-ml-airf-05c3abe0-bucket", "dataset_id": "weather_prediction", "weather_table_id": "weather_history_LJ", "predict_table_id": "weather_predictions", "model_table_id": "weather_models" }
This variable is essential for the correct functioning of the DAGs, as it provides necessary configuration information.
Now, you should have Apache Airflow set up and running for your project.