This repo provides several tutorial notebooks showing how s2spy
and lilio
can faciliate your data-driven (sub)seasonal (S2S) forecasts workflow.
Here is an example of a basic data-driven S2S forecasts workflow for regression modelling with s2spy
and lilio
.
- Define calendar
- Prepare your data
Precursor: sea surface temperature (SST) from ERA5
Target: US surface temperature (T2M) from ERA5 - Map the calendar to the data
- Train-test split based on the anchor years-> 70%/30% split (outer cv loop)
- Mask the data to get only full training years
- Fit (out of sample) preprocessing (incl. detrend, remove climatology, rolling mean) to the masked data
- Preprocess all data
- Resample all data to the calendar
- Train-test split based on the previous split (outer cv loop -> inner cv loop)
- Dimensionality reduction
- Fit the ML model (Ridge) and transform to the test data
- Evaluate the results (skill metrics, visualization) and workflow (time and memory usage)
This workflow is illustrated below:
Similarly, you can adapt this recipe to your deep learning workflow with a few changes. You can find several examples in the next section.
The tutorial notebooks include a case study in which we attempt to predict surface temperature over US using the SST over Pacific. We use processed ERA5 fields to perform data-driven forecasts. More details about the data can be found in this README.md.
Before playing with these notebooks, please make sure that you have all the dependent packages installed. You can simply install the dependencies by go to this repo and run the following command:
pip install .
Below are recipes with different machine learning techniques:
Predict surface temperature over US with SST over Pacific with s2spy
and lilio
: