This repository contains code to create stores of cloud-optimized GeoTiFFs (COGs) from input raster data. Data is ingested from various sources and stored in a private Azure Storage Container.
These forecasts contain 0.4 degree resolution global data on precipitation rates across 0-6 month lead-times. Historical data from as early as 1981 has been accessed via ECMWF's Meteorological Archival and Retrieval System (MARS). See this User Manual for more details. Note: For more timely access than is provided by MARS, recent forecast data is populated from a private data order from ECMWF.
The ERA5 reanalysis provides averaged monthly and hourly estimates of total precipitation across a 0.25 degree global grid. See these docs for more information on the full family of ERA5 datasets.
NASA's Integrated Multi-satellitE Retrievals for GPM (IMERG) generates estimated precipitation over the majority of Earth's surface based on information from the GPM satellite constellation. See this Technical Spec for more details.
Atmospheric and Environmental Research (AER) FloodScan's flood extent depiction products provide daily algorithmic delineation of temporarily flooded and unflooded areas from satellite remote sensing observations. See this Technical Spec for more details.
All pipelines can be run as a CLI, via the run_pipeline.py
entrypoint. For detailed usage instructions and options, see our Pipeline Usage Guide.
Pipelines are run in production as Jobs on Databricks. Please reach out if you require access.
- Clone this repository and create a virtual Python (3.12.4) environment:
git clone https://github.com/OCHA-DAP/ds-raster-pipelines.git
python3 -m venv venv
source venv/bin/activate
- Install Python dependencies:
pip install -r requirements.txt
pip install -r requirements-dev.txt
- If processing
.grib
files usingxarray
, thecfgrib
engine also requires an ecCodes system dependency. This can be installed with
sudo apt-get install libeccodes-dev
- Create a local
.env
file with the following environment variables:
# Connection to Azure blob storage
DSCI_AZ_SAS_DEV=<provided-on-request>
DSCI_AZ_SAS_PROD=<provided-on-request>
# MARS API requests
ECMWF_API_URL=<provided-on-request>
ECMWF_API_EMAIL=<provided-on-request>
ECMWF_API_KEY=<provided-on-request>
# ECMWF AWS bucket
AWS_ACCESS_KEY_ID=<provided-on-request>
AWS_SECRET_ACCESS_KEY=<provided-on-request>
AWS_BUCKET_NAME=<provided-on-request>
AWS_DEFAULT_REGION=<provided-on-request>
# CDS API credentials
CDSAPI_URL=<provided-on-request>
CDSAPI_KEY=<provided-on-request>
# IMERG Authentication
IMERG_USERNAME=<provided-on-request>
IMERG_PASSWORD=<provided-on-request>
# FloodScan access urls
FLOODSCAN_SFED_URL=<provided-on-request>
FLOODSCAN_MFED_URL=<provided-on-request>
CONTAINER_RASTER='raster'
All code is formatted according to black and flake8 guidelines. The repo is set-up to use pre-commit. Before you start developing in this repository, you will need to run
pre-commit install
You can run all hooks against all your files using
pre-commit run --all-files