Forecast riverine flooding. Part of IBF-system.
The pipeline roughly consists of three steps:
- Extract data on river discharge from an external provider, both in specific locations (stations) and over pre-defined areas (administrative divisions).
- Forecast floods by determining if the river discharge is higher than pre-defined thresholds; if so, calculate flood extent and impact (affected people).
- Send this data to the IBF app.
The pipeline stores data in:
- ibf-cosmos (Azure Cosmos DB): river discharge per station / administrative division, flood forecasts, and trigger thresholds
- 510ibfsystem (Azure Storage Account): raw data from GloFAS and other geospatial data
The pipeline depends on the following services:
- GloFAS: provides river discharge forecasts
- Glofas data pipeline in IBF-data-factory (Azure Data Factory): extracts GloFAS data and stores it in
510ibfsystem
- IBF-app
For more information, see the functional architecture diagram.
To run the pipeline locally
- fill in the secrets in
.env.example
and rename the file to.env
; in this way, they will be loaded as environment variables - install requirements
pip install poetry
poetry install --no-interaction
- run the pipeline with
python flood_pipeline.py
Usage: flood_pipeline.py [OPTIONS]
Options:
--country TEXT country ISO3
--prepare prepare discharge data
--extract extract discharge data
--forecast forecast floods
--send send to IBF
--save save to storage
--datetimestart TEXT datetime start ISO 8601
--datetimeend TEXT datetime end ISO 8601
--debug debug mode: process only one ensemble member from yesterday
--help show this message and exit.
You can run the pipeline with dummy data that simulate different scenarios using python run_scenarios.py
:
Usage: run_scenario.py [OPTIONS]
Options:
-s, --events TEXT list of events
-c, --country TEXT country
-d, --upload_time TEXT upload datetime [optional]
--help show this message and exit
The scenario is entirely defined by the events
parameter, which is a list of dictionaries. Each dictionary represents a flood event and has the following keys:
station-code
: the unique ID of the GloFAS stationtype
: the type of the event, eithertrigger
,low-alert
,medium-alert
, orno-trigger
lead-time
: the number of days between the forecast and the event
Example of a scenario for Kenya with two events:
python run_scenario.py -c "KEN" -s '[{"station-code": "G5305", "type": "trigger", "lead-time": 5}, {"station-code": "G5142", "type": "medium-alert", "lead-time": 7}]'
Tip
Scenarios can be run remotely and loaded to ibf-test through the Logic App river-flood-pipeline-scenario. Just make a POST request to the Logic App with the necessary payload1.
- Identify the error in the most recent runs of river-flood-pipeline-prod:
- Check run time, if too short or too long (should be about 30-90 min)
- Check if any action doesn't have a green tick (failed action)
- Check the logs in "Get logs from a container instance"
- Find out which part of code fails based on traceback and error messages in the logs
- Fix the bug on
dev
and test:- Checkout the
dev
branch and pull the latest changes (git checkout dev && git pull
) - Fix the bug
- [OPTIONAL] Test locally, if you have time and disk space (a few GBs are needed)
- Commit and push the changes to the
dev
branch (git add . && git commit -m "bug fix" && git push origin dev
) - Test remotely with river-flood-pipeline-dev, which will upload the output to ibf-test; just make a POST request to the Logic App with the necessary payload1. Example in Python:
- Checkout the
import requests
url = "https://prod-79.westeurope.logic.azure.com:443/workflows/..."
response = requests.request("POST", url, json={"country": "KEN"})
- Deploy to
prod
:- Open a pull request (PR) from
dev
tomain
and inform IBF team that it needs to be merged - Wait for the PR to be merged and the code deployed
- Open a pull request (PR) from
- Re-submit the failed run of river-flood-pipeline-prod or wait for it to run again the next day.
- Check that the administrative boundaries are in the IBF system; if not, ask IBF developers to add them
- Add country-specific configuration in
config/config.yaml
- Create historical flood extent maps
python data_updates\add_flood_maps.py --country <country ISO3>
- Compute trigger and alert thresholds
python data_updates\add_flood_thresholds.py --country <country ISO3>
- Update
Glofas data pipeline
in IBF-data-factory so that it will trigger a pipeline run for the new country
You don't. The pipeline is designed to work in the same way for all countries. If you need to change the pipeline's behavior for a specific country, please discuss your needs with your fellow data specialist, they will try their best to accommodate your request.
GloFAS should update river discharge data in a backward-compatible way, i.e. without changing the data model. If that is not the case, you need
to have a look at floodpipeline/extract.py
and change what's needed.
What will probably change with the new GloFAS version are the trigger/alert thresholds. To update the trigger/alert thresholds... [TBI].