scores_metrics

Machine Learning in Climate Sciences University of Tübingen

Jakob Schloer

Time-series data

Keyword sample

Quantile score
Cross validation error
- Leave one out error
Correlation coefficients
Wasserstein metric

Comparing Time series:

Correlations
Eucledian Distance
Dynamic Time Warping: DTW finds an optimal match between two sequences of feature vectors which allows for stretched and compressed sections of the sequence.
Mutual Information: Entropy based metric, introduced by Shannon. Applied to time series, there are quite a few papers by now. For example (this)[https://arxiv.org/abs/0904.4753].
iSAX: The final one I want to flag is the so-called “Motif Discovery” and the related (iSAX)[http://www.cs.ucr.edu/~eamonn/iSAX.pdf] representation of time series (by Eamon Keogh), which is very scalable.

Scale-dependent errors

The errors are on the same scale as the data, i.e. different data sets cannot be compared

Mean absolute error (MAE)

[\begin{aligned} MAE = mean(|y_i - \hat{y}i|) = \frac{1}{N} \sum{i=1}^{N} |y_i - \hat{y}_i| \end{aligned}]

Minimizing MAE leads to prediction of median.

Mean square error (MSE)

[\begin{aligned} MSE = mean(|y - \hat{y}_i|^2|^2) \end{aligned}]

Strongly penalizes large wrong predictions

Root mean square error (RMSE)

[\begin{aligned} RMSE = \sqrt{mean(|y_i - \hat{y}_i|^2)} \end{aligned}]

Minimizing RMSE leads to prediction of mean

Percentage errors

Percentage errors are unit free and therefore allow comparison of different data sets.

Mean absolute percentage error (MAPE)

[\begin{aligned} MAPE = mean(\frac{|y_i - \hat{y}_i|}{y_i}) \end{aligned}]

Problems:

cannot be used if there are zero values
puts more weight on negative errors

Symmetric absolute percentage error (sMAPE) Used to overcome the problems of MAPE

[\begin{aligned} sMAPE = mean(\frac{|y_i - \hat{y}_i|}{(|y_i| + |\hat{y}_i|)/2}) \end{aligned}]

Scaled error

Alternative to percentage errors when comparing different datasets

Mean absolute scaled error (MASE)

[\begin{aligned} MASE = \frac{mean(|y_i - \hat{y}i|)}{\frac{1}{N-1} \sum{t=2}^{N} |y_t - y_{t-1}|} \end{aligned}]

scale invariance
symmetric
less than one if it arises from a better forecast than the average naïve forecast and conversely it is greater than one if the forecast is worse than the average naïve forecast

https://otexts.com/fpp2/accuracy.html https://scikit-learn.org/stable/modules/model_evaluation.html#

Goodness of Fit

Check if a hypothesis is correct. Often referred to as explained variance scores.

Coefficient of determination ((R^2)) Describes the variance (of y) which is explained by the model prediction.

[\begin{aligned} R^2 = 1 - \frac{\sum_{i=1}^{N}(y_i - \hat{y}i)^2}{\sum{i=1}^{N}(y_i - \bar{y})^2} \end{aligned}]

F-test The F-value expresses how much of the model has improved compared to the mean (null hypothesis) given the variance of the model and data. The F-test is obtained by

[\begin{aligned} F = \frac{ N (\bar{\hat{y}} - \bar{y})^2 }{\sum_{i=1}^{N} (\hat{y}_{i} - \bar{\hat{y}})^2 / (N-1)} \end{aligned}]

where (Y) is the set of data points, (\hat{Y} = \left[ \hat{y}_{i} \right]) with (i = 1,.., N) are a set of predicted points.

Chi-square test

Cross-validation

Leave-one out error (LOOE)

Correlation/ Synchrony

Pearson correlation Pearson correlation measures how two continuous signals co-vary over time. The linear relationship between these signals are given from -1 (anticorrelated) to 0 (incorrelated) to 1 (perfecly correlated).

The Pearson correlation coefficient for two random variables (X_1) and (X_2) is:

[\begin{aligned} \rho_{X_1, X_2} = \frac{cov(X, Y)}{\sigma_X \sigma_Y} = \frac{\mathbb{E}\left[ (X - \mu_x)(Y - \mu_Y)\right]}{\sigma_X \sigma_Y} \end{aligned}]

For time-series on can calculate a

global correlation coefficient: a single value
local correlation coefficient: determine correlation in a rolling window over time

Caution:

outliers can skew the correlation
assuming the data is homoscedatic, i.e. constant variances

Time Lagged Cross Correlation (TLCC) TLCC is a measure of similarity of two series as a function of displacement. It captures directionality between two signals, i.e. leader-follower relationship. Idea: Similar to convolution of two signals, i.e. shifting one signal with respect to the other while repeatedly calculating the correlation.

[\begin{aligned} (f \star g)(\tau)\ \triangleq \int_{-\infty}^{\infty} f^*(t) g(t+\tau),dt\end{aligned}]

0.5 Cross correlation of f and g

0.5 Window time lagged cross correlation

Windowed time lagged cross correlations (WTLCC) are an extension of TLCC where local correlations coefficients are computed for each lag-time which is then plotted as a matrix.

Granger causality

Dynamic Time Wrapping (DTW)

Spatial Data

saliency maps: highlights which changes in the input would most affect the output.
heat maps: highlights which inputs are most important for the prediction

Saliency maps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly