Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPA creating pipelines using poetry #6

Open
wants to merge 107 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
9036429
saving this to compare after
patriciacatandi Jan 31, 2024
07b0e2f
alertario optimizer transform working
patriciacatandi Feb 6, 2024
62487c5
first functioning version
patriciacatandi Jun 4, 2024
c196651
working version of pluviometers: put all transformation in one class,…
patriciacatandi Jun 4, 2024
c9791de
version adding latlon with rain_gauge_station inside the dataset_prep…
patriciacatandi Jun 5, 2024
6ecbe45
version running both rain_gauge and weather_station
patriciacatandi Jun 6, 2024
85b8d61
first try to run code on prefect
patriciacatandi Jun 27, 2024
cc2ec19
changin dockerfile, gitignore and etl_alertario
patriciacatandi Jun 27, 2024
dd2ed99
trying new dockerfile
patriciacatandi Jun 27, 2024
faa9bbe
changing poetry.lock
patriciacatandi Jun 27, 2024
7b0a717
trying new dockerfile 2
patriciacatandi Jun 27, 2024
e1d421e
trying new dockerfile com ignore do hadolint
patriciacatandi Jun 27, 2024
9bdfb72
lint etl_alertario.py
patriciacatandi Jun 27, 2024
d39f93f
Delete /utils/logging.py
patriciacatandi Jun 27, 2024
d35c75b
linting utils files
patriciacatandi Jun 27, 2024
5833452
changing .gitignore and removing logging file
patriciacatandi Jun 27, 2024
fb637ac
removing logging file
patriciacatandi Jun 27, 2024
5553c87
fixing kinting
patriciacatandi Jun 27, 2024
6b5a77f
fixing kinting
patriciacatandi Jun 27, 2024
9df3884
Atualizado o poetry.lock com as novas dependências
patriciacatandi Jun 27, 2024
4f4da0c
Merge branch 'main' into staging/rionowcast
patriciacatandi Jun 27, 2024
7fc3bc8
trying to solve poetry problem
patriciacatandi Jun 27, 2024
6f448fd
adding new poetry.lock
patriciacatandi Jun 27, 2024
f68d2dc
adding parameters
patriciacatandi Jun 27, 2024
5177c7c
Update poetry in lint.yaml
patriciacatandi Jun 27, 2024
9804619
Update code-tree-analysis.yaml
patriciacatandi Jun 27, 2024
8840904
Update cd_staging.yaml
patriciacatandi Jun 27, 2024
1d850a6
Update cd.yaml
patriciacatandi Jun 27, 2024
82ba0ab
fixing lint
patriciacatandi Jun 27, 2024
f328a2d
Merge branch 'staging/rionowcast' of github.com:prefeitura-rio/pipeli…
patriciacatandi Jun 27, 2024
8813e5e
fixing lint
patriciacatandi Jun 27, 2024
5e35167
fixing lint tasks.py
patriciacatandi Jun 27, 2024
a8ee650
fixing lint tasks.py
patriciacatandi Jun 27, 2024
e111924
fixing flow import
patriciacatandi Jun 27, 2024
40939ab
removing code_owners from Flow
patriciacatandi Jun 28, 2024
1f93ce2
removing sentry
patriciacatandi Jun 28, 2024
303da5c
fixing secret path for infisical
patriciacatandi Jun 28, 2024
458b561
fixing secret path for infisical
patriciacatandi Jun 28, 2024
de836d4
fixing secret path for infisical
patriciacatandi Jun 28, 2024
f1abac3
trying another way to get billing_project_id
patriciacatandi Jul 1, 2024
e083cb9
forcing billing project to rj-cor
patriciacatandi Jul 1, 2024
83b5129
forcing billing project to rj-cor
patriciacatandi Jul 1, 2024
6e66c12
trying to download data from bigquery
patriciacatandi Jul 1, 2024
02feb42
trying to download data from bigquery
patriciacatandi Jul 1, 2024
51a3b44
trying to download data from bigquery
patriciacatandi Jul 1, 2024
0957f8b
trying to download data from bigquery
patriciacatandi Jul 1, 2024
c886adc
trying to download data from bigquery
patriciacatandi Jul 1, 2024
0211ebe
adding predict and try except to download data
patriciacatandi Jul 3, 2024
5fd97af
adding @task on predict
patriciacatandi Jul 3, 2024
b7ee932
changing try except to if
patriciacatandi Jul 3, 2024
3197bb6
fix
patriciacatandi Jul 3, 2024
18741e8
removing wait_task_run from flows
patriciacatandi Jul 3, 2024
eb50a0a
fix
patriciacatandi Jul 3, 2024
19da4f1
changing details on flows
patriciacatandi Jul 4, 2024
6e85d54
first integrator running on command line + changes in etl_alertario t…
patriciacatandi Jul 5, 2024
bd04411
minor fix
patriciacatandi Jul 5, 2024
270cadf
adding integrator
patriciacatandi Sep 6, 2024
3b5d539
removing exemplo pipeline
patriciacatandi Sep 23, 2024
03247e0
first version of prediction flow
patriciacatandi Sep 25, 2024
e8e4ec6
changing pre-treatment flow
patriciacatandi Sep 25, 2024
f16bf26
Merge branch 'main' into staging/rionowcast_dataflow
mergify[bot] Sep 25, 2024
6e023ef
changing tasks names
patriciacatandi Oct 2, 2024
f010424
Merge branch 'staging/rionowcast_dataflow' of github.com:prefeitura-r…
patriciacatandi Oct 2, 2024
971f0d0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 2, 2024
1d1be87
addind state_handlers on flows
patriciacatandi Oct 4, 2024
19b5218
adding impa files
patriciacatandi Oct 4, 2024
97a07f5
changing to uv
patriciacatandi Oct 10, 2024
c2c96ef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 10, 2024
6d8bc16
adding impa flow on flows.py
patriciacatandi Oct 11, 2024
26881df
Merge branch 'staging/impa' of github.com:prefeitura-rio/pipelines_we…
patriciacatandi Oct 11, 2024
d7702c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 11, 2024
b0080f6
migrating to uv
patriciacatandi Oct 14, 2024
690a802
Merge branch 'staging/impa' of github.com:prefeitura-rio/pipelines_we…
patriciacatandi Oct 14, 2024
16d1a5a
changing pyproject.toml
patriciacatandi Oct 14, 2024
e64d585
changing pyproject.toml
patriciacatandi Oct 14, 2024
d259c46
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 14, 2024
e107236
testing uv
patriciacatandi Oct 15, 2024
5977ad3
testing antonio pyproject
patriciacatandi Oct 15, 2024
772f999
adding ci libs on pyproject.toml
patriciacatandi Oct 15, 2024
9b3241a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 15, 2024
7fa8742
adding prefeitura-rio
patriciacatandi Oct 15, 2024
f074486
fixing flake8
patriciacatandi Oct 15, 2024
5942c95
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 15, 2024
5cb4c4d
fixing flake8
patriciacatandi Oct 15, 2024
022ad6f
Merge branch 'staging/impa_poetry' of github.com:prefeitura-rio/pipel…
patriciacatandi Oct 15, 2024
713c6eb
fixing poetry
patriciacatandi Oct 15, 2024
de6d2b6
changing pysteps version
patriciacatandi Oct 16, 2024
949e1d5
adding gcc and build-essential on dockerfile
patriciacatandi Oct 16, 2024
23d2592
adding pipelines.precipitation_model.impa. in front of src
patriciacatandi Oct 16, 2024
87fe56c
removing **/data/* from gitignore
patriciacatandi Oct 16, 2024
b6e0165
removing description from Parameter on flows.py
patriciacatandi Oct 16, 2024
21b31e4
saving predictions on gcp
patriciacatandi Oct 16, 2024
07b6853
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2024
d0712b2
adding init on utils folder
patriciacatandi Oct 16, 2024
86ecf31
changing utils name
patriciacatandi Oct 16, 2024
5d5d310
changing log
patriciacatandi Oct 16, 2024
b5073be
removing np.numpy
patriciacatandi Oct 16, 2024
673f221
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2024
8aa99fe
fixing rionowcast
patriciacatandi Oct 16, 2024
5f5ab27
changing hours_from_past
patriciacatandi Oct 16, 2024
5fc8a48
adding plot images
patriciacatandi Oct 16, 2024
c5199eb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2024
d68b5bc
changing paths to inclide pipelines/precipitation_model/impa/src
patriciacatandi Oct 16, 2024
0f109b3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2024
b156e5d
changing other paths
patriciacatandi Oct 16, 2024
97920c6
Merge branch 'staging/impa_poetry' of github.com:prefeitura-rio/pipel…
patriciacatandi Oct 16, 2024
a9b987d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
[flake8]
max-line-length = 100
max-line-length = 120

exclude =
**/rionowcast/gypscie/*
**/rionowcast/test_flow/*
script.py
2 changes: 1 addition & 1 deletion .github/workflows/cd_staging.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:

- name: Install Python dependencies for deploying
run: |-
pip install -U pip poetry
pip install -U pip "poetry<1.8"
poetry config virtualenvs.create false
poetry install --with dev --with ci

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/code-tree-analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:

- name: Install Python dependencies for deploying
run: |-
pip install -U pip poetry
pip install -U pip "poetry<1.8"
poetry config virtualenvs.create false
poetry install --with dev --with ci

Expand Down Expand Up @@ -46,4 +46,4 @@ jobs:
uses: thollander/actions-comment-pull-request@v1
with:
message: "${{ steps.code-tree-analysis.outputs.pr-message }}"
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
4 changes: 2 additions & 2 deletions .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@ jobs:

- name: Set up Poetry and upgrade pip
run: |
pip install -U pip poetry
pip install -U pip "poetry<1.8"

- name: Install dependencies
run: |
poetry config virtualenvs.create false && poetry install --with dev --with ci

- name: Lint with black, isort and flake8
run: |
task lint
task lint
93 changes: 86 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
# user defined
# Folders that Git should ignore
*.ipynb_checkpoints/
pipelines/precipitation_model/impa/models/data/
pipelines/precipitation_model/impa/models/docs/
pipelines/precipitation_model/impa/models/eval/
pipelines/precipitation_model/impa/models/models/
pipelines/precipitation_model/impa/models/personal/
pipelines/precipitation_model/impa/models/radar_cal/
pipelines/precipitation_model/impa/models/lightning_logs/
**/test_flow/** */

# User-defined
.replit
replit.nix
**/data/
test_local.py
pylint.txt
test.py
Expand All @@ -10,18 +20,68 @@ test/*.ipynb
test/*.csv
setup.py
.vscode/*
*.hdf


# File extensions that Git should ignore

# tex
*.aux
*.bbl
*.blg
*.fdb_latexmk
*.fls
*.out
*.synctex.gz

# images
*.pdf
*.png

# data files
*.csv
*.xls
*.pkl
*.npy
*.Rdata
*.Rds
*.hdf
**/gypscie/**

# Byte-compiled / optimized / DLL files
# Python
*.pyc
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Compiled source
*.com
*.class
*.dll
*.exe
*.o
*.so

# Packages
*.7z
*.dmg
*.gz
*.iso
*.jar
*.rar
*.tar
*.zip

# Distribution / packaging
.Python
build/
Expand All @@ -43,6 +103,13 @@ share/python-wheels/
*.egg
MANIFEST

# Logs and databases
*.log
*.sql
*.sqlite
pip-log.txt
pip-delete-this-directory.txt

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
Expand All @@ -52,6 +119,8 @@ MANIFEST
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
*.manifest
*.spec

# Unit test / coverage reports
htmlcov/
Expand Down Expand Up @@ -108,7 +177,7 @@ ipython_config.py
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
# PEP 582
__pypackages__/

# Celery stuff
Expand Down Expand Up @@ -146,4 +215,14 @@ dmypy.json
.pyre/

# VSCode project settings
.vscode/
.vscode/

# MLflow
**/mlruns/**
*.csv
**/gypscie/processors/**
!/gypscie/**/MLproject
!/gypscie/**/etl_alertario.py
!/gypscie/**/conda.yaml
/gypscie/utils/logging.py
!/gypscie/processors/utils/*
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ ARG PYTHON_VERSION=3.10-slim
# Start Python image
FROM python:${PYTHON_VERSION}

# Install git
# Install git, gcc, build-essential, and other dependencies
# hadolint ignore=DL3008
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y --no-install-recommends git ffmpeg libsm6 libxext6 build-essential gcc && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Expand Down
4 changes: 3 additions & 1 deletion pipelines/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@ class constants(Enum):
######################################
# Agent labels
######################################
# EXAMPLE_AGENT_LABEL = "example_agent"
WEATHER_FORECAST_AGENT_LABEL = "weather-forecast"

######################################
# Other constants
######################################
# EXAMPLE_CONSTANT = "example_constant"
INFISICAL_USERNAME = "USERNAME"
INFISICAL_PASSWORD = "PASSWORD"
2 changes: 0 additions & 2 deletions pipelines/exemplo/__init__.py

This file was deleted.

20 changes: 0 additions & 20 deletions pipelines/exemplo/nome_do_objetivo/flows.py

This file was deleted.

8 changes: 0 additions & 8 deletions pipelines/exemplo/nome_do_objetivo/tasks.py

This file was deleted.

2 changes: 1 addition & 1 deletion pipelines/flows.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
"""
Imports all flows for every project so we can register all of them.
"""
from pipelines.exemplo import * # noqa
from pipelines.precipitation_model import * # noqa
3 changes: 3 additions & 0 deletions pipelines/precipitation_model/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# -*- coding: utf-8 -*-
from pipelines.precipitation_model.impa.flows import * # noqa
from pipelines.precipitation_model.rionowcast.flows import * # noqa
134 changes: 134 additions & 0 deletions pipelines/precipitation_model/impa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Rio Rain

This project aims to provide a pipeline for real time prediction in Rio de Janeiro. It contains a script that automatically downloads the most recent satellite data to then process it and, finally, make predictions for the next 3 hours.

## Table of Contents

- [Rio Rain](#rio-rain)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Main script](#main-script)
- [Evaluation](#evaluation)
- [File structure](#file-structure)

## Introduction

This project contains the necessary code for nowcasting with 5 different models. It also includes a script that plots the predictions.

## Features

- Real-time prediction
- Nowcasting visualization
- Performance of each model in the last 3 hours

## Installation

To use the scripts contained in the provided zip file, the necessary libraries must first be correctly installed. There are different ways to do it:

- Using `poetry`:
To install the necessary requirements using `poetry`, you may open the terminal on the root directory and call `poetry install`. Unfortunately, there is a necessary library that cannot be installed through `poetry`, so you can instead use `pip` for this particular library calling `pip install --no-use-pep517 mamba-ssm`.

- Using `pip`:
To install the dependencies using `pip`, you can use the provided file `requirements.txt` by calling `pip install -r requirements.txt`.

- Using other project managers:
You may also use other project managers like `conda` directly installing the libraries contained in `requirements.txt`.

## Usage

### Main script

To produce predictions in real time, it is enough to call `python src/eval/update-real_time.py` on the root directory in the correct project environment. Once this command is called, the script will run the following tasks, in order:

- Download GOES-16 data from AWS
- Process the data locally
- Build a dataset appropriate for making predictions
- Make predictions for each of the provided models

Once the script stops running, you can find the output files as described in the [file structure](#file-structure) section.

Note that there are optional arguments for this command:

`python src/eval/update-real_time.py [--cuda] [--num_workers] [--datetime]`,

Here `--cuda` may be passed if you want to make predictions though GPU computing. If you want to do all calculations in CPU, you should not pass the optional argument `--cuda`. This will be slower and some models may not work in this mode.

On the other hand `[--num_workers]` is the number of processes that may be run in parallel. It must be an integer value greater than zero. Generally, the larger this number is, the faster the script reaches its conclusion.

Finally, `[--datetime]` may be passed to make predictions from the time passed in UTC. The format passed must be '%Y-%m-%d %H:%M:%S', so if we want predictions from 13/01/2024 14:00:00 BRT, we must call `bash src/eval/update-real_time.sh --datetime '2024-01-13 17:00:00'`.

If it is desired to make predictions for just some of the models, it is possible to edit the file `src/eval/real_time_config.json` and delete the dictionary entries associated to the model that is to be excluded. Be mindful that the model `EVONET` is necessary for predicting with `NowcastNet`.

### Evaluation

Scripts for evaluating model predictions for the last three hours are also made available. Once the predictions are made through the main script, you may call `python src/eval/viz/plot_real_time.py [--num_workers]` to produce plots or `python src/eval/metrics/calc-metrics.py [--num_workers]` to calculate metrics.


## File structure

```
rio-rain
│ README.md
│ poetry.lock
│ pyproject.toml
└───data
│ │
│ └───dataframe_grids
│ │ ...
| dataframes
| | ...
| processed
| | ...
| raw
| └ ...
└───eval
| └ ...
|
└───models
| └ ...
|
└───predictions
| └ ...
|
└───src
└ ...
```

In the `data` folder, the coordinate grids associated to the points in Earth's surface, raw data downloaded from AWS, locally processed data and dataframes ready for prediction are found in their respective subfolder.

In the `eval` folder, the output metrics and prediction graphs are found in their respective subfolder.

In the `models` folder, the parameters and necessary information associated to each model are found.

In the `predictions` folder, the output predictions are saved in `.hdf` files which contain predictions for each model. In these files, the data is organized as follows:

```
root
└───date1
│ │
│ └───time1
| |
│ └───datetime1
| |
│ └───datetime2
| |
│ └─── ...
| |
| time2
| | ...
|
└───date2
| └ ...
| ...
```

In this hierarchical structure, `date1`, `date2`, etc. refer to the date of the last observation informed to the model. `time1`, `time2`, etc. refer to the time of the last observation informed to the model. `datetime1`, `datetime2`, etc. represent the date and time of the prediction being made. All of these dates and times are in accordance with UTC timezone.

In the leaf nodes given by `datetime1`, `datetime2`, etc. 256x256 matrices which represent the predictions are found. The coordinates of these pixels is in accordance with the `.npy` file found in `data/dataframe_grids/rio_de_janeiro-res=2km-256x256.npy`.

Finally, we also have the `src` folder, in which all source code is found, relating to the downloading and processing of data, the scripts that load the saved models and make predictions, as well as the codes that evaluate the predictions.
Loading
Loading