scores_metrics

Prediction scores and metrics

Time-series data

Scale-dependent errors

The errors are on the same scale as the data, i.e. different data sets cannot be compared

Mean absolute error (MAE)

$MAE = mean(|y_i - \hat{y}_i|) = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|$

Minimizing MAE leads to prediction of median.

Mean square error (MSE)

$MSE = mean(|y - \hat{y}_i|^2|^2)$

Strongly penalizes large wrong predictions

Root mean square error (RMSE)

$RMSE = \sqrt{mean(|y_i - \hat{y}_i|^2)}$

Minimizing RMSE leads to prediction of mean

Percentage errors

Percentage errors are unit free and therefore allow comparison of different data sets.

Mean absolute percentage error (MAPE)

$MAPE = mean(\frac{|y_i - \hat{y}_i|}{y_i})$

Problems:

cannot be used if there are zero values
puts more weight on negative errors

Symmetric absolute percentage error (sMAPE) Used to overcome the problems of MAPE

$sMAPE = mean(\frac{|y_i - \hat{y}_i|}{(|y_i| |\hat{y}_i|)/2})$

Scaled error

Alternative to percentage errors when comparing different datasets

Mean absolute scaled error (MASE)

$MASE = \frac{mean(|y_i - \hat{y}_i|)}{\frac{1}{N-1} \sum_{t=2}^{N} |y_t - y_{t-1}|}$

scale invariance
symmetric
less than one if it arises from a better forecast than the average naïve forecast and conversely it is greater than one if the forecast is worse than the average naïve forecast

Literature:

https://otexts.com/fpp2/accuracy.html

https://scikit-learn.org/stable/modules/model_evaluation.html#

Goodness of Fit

Check if a hypothesis is correct. Often referred to as explained variance scores.

Coefficient of determination (R^2)

Describes the variance (of y) which is explained by the model prediction.

$R^2 = 1 - \frac{\sum_{i=1}^{N}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{N}(y_i - \bar{y})^2}$

F-test

The F-value expresses how much of the model has improved compared to the mean (null hypothesis) given the variance of the model and data. The F-test is obtained by

$F = \frac{ N (\bar{\hat{y}} - \bar{y})^2 }{\sum_{i=1}^{N} (\hat{y}_{i} - \bar{\hat{y}})^2 / (N-1)}$

where Y is the set of data points, $\hat{Y} = \left[ \hat{y}_{i} \right]$ with (i = 1,.., N) are a set of predicted points.

Chi-square test

Correlation/ Synchrony

Pearson correlation

Pearson correlation measures how two continuous signals co-vary over time. The linear relationship between these signals are given from -1 (anticorrelated) to 0 (incorrelated) to 1 (perfecly correlated).

The Pearson correlation coefficient for two random variables X_1 and X_2 is:

$\rho_{X_1, X_2} = \frac{cov(X, Y)}{\sigma_X \sigma_Y} = \frac{\mathbb{E}\left[ (X - \mu_x)(Y - \mu_Y)\right]}{\sigma_X \sigma_Y}$

For time-series on can calculate a

global correlation coefficient: a single value
local correlation coefficient: determine correlation in a rolling window over time

Caution:

outliers can skew the correlation
assuming the data is homoscedatic, i.e. constant variances

Time Lagged Cross Correlation (TLCC)

TLCC is a measure of similarity of two series as a function of displacement. It captures directionality between two signals, i.e. leader-follower relationship.

Idea: Similar to convolution of two signals, i.e. shifting one signal with respect to the other while repeatedly calculating the correlation.

$(f \star g)(\tau)\ \triangleq \int_{-\infty}^{\infty} f^*(t) g(t \tau)\,dt$

Windowed time lagged cross correlations (WTLCC) are an extension of TLCC where local correlations coefficients are computed for each lag-time which is then plotted as a matrix.

Granger causality

Dynamic Time Wrapping (DTW)

Cross-validation

Leave-one out error (LOOE)

Spatial Data

============

saliency maps: highlights which changes in the input would most affect the output.
heat maps: highlights which inputs are most important for the prediction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly