Skip to content

In this repository you'll learn how to analyse a time serie and forecast it's values using ARIMA, LSTM and multivariate time series.

Notifications You must be signed in to change notification settings

SerenaTetart/Time_series_analysis

Repository files navigation

Time_series_analysis

Table of contents

General info

In this repository you'll learn how to analyse a time serie and forecast it's values.

Here we will try to forecast sales and cryptocurrencies using ARIMA, LSTM and multivariate time series.

Because cryptocurrencies are highly volatile and doesn't follow a specific pattern the results won't be good for the last two projects.

Introduction - Quick overview of time series

For the purpose of this introduction we'll use as dataset Bitcoin's closing market price everyday since 2014.

A time serie is composed of 3 components:

  • (y: the baseline value for the series)
  • Trend: Linear increasing or decreasing behavior of the serie over time
  • Seasonality: Repeating patterns or cycles of behavior over time.
  • Noise: Variability in the observations that cannot be explained by the model

The reunion of these 3 components is equal to the time serie, it can be additive if:

y = trend + seasonality + noise

or multiplicative if:

y = trend * seasonality * noise

In our case the time serie is multiplicative.

Bitcoin's Price since 2014

Bitcoin's Trend since 2014

We can clearly see that the trend is constantly rising, with a peak toward 2021.

The Trend is calculated based on the moving average with a sliding window L :

Bitcoin's Seasonality since 2014

The seasonality indicates that there is a rising trend toward January and June and a decreasing trend in between.

In order to get the seasonality and the noise we need to refer to the multiplicative formula in the beginning:

Then in order to isolate the seasonality we calculate the moving average based on one year, L=365 (could be one month) of our last formula:

Finally we calculate the noise by dividing the seasonal noise by the seasonal component:

Bitcoin's Noise since 2014

As we can see the data is very noisy, this time serie depends on a lot of factors.

Project 1 - Sales forecasting with ARIMA

In this first project we will forecast the sales and number of orders of a retail store with the model ARIMA which stands for ‘Auto Regressive Integrated Moving Average’.

The dataset is from Kaggle: Superstore Sales Dataset

We can notice that the number of orders and the sales income are not correlated.

In order to use ARIMA we need to make the time serie stationary, meaning the serie won't depend on time anymore, by differencing it.

An ARIMA model is defined by 3 terms: p, d, q where:

  • p is the order of the Auto Regressive term
  • q is the order of the Moving Average term
  • d is the number of differencing required to make the time serie stationary

We won't go into full details, but the core ideas on how to find these parameters will be shown.

here we can see that these series are already stationary thanks to the autocorrelation plot and the dickey–fuller test, they both have p < 0.05.

So d = 0 and q = 1 because the lag 1 is way above the significance line.

We then plot the Partial Autocorrelation plot and see that p = 1.

To finish we make the model with a library called "pmdarima" which will try different combination of these parameters in order to find the best model.

Finally we plot the forecasts:

Yes... It's a straight line because there is no "trend" or "seasonality" in our data !

ARIMA is useful only in certain cases where the time serie has a "trend" or a "seasonality", in this case it will only predict the mean of the time serie which result in a straight line.

Furthermore we use ARIMA for short-term forecasts, long term forecasts will only result in a straight line too.

Project 2 - Bitcoin forecasting with LSTM

For this project we'll use as dataset Bitcoin's closing market price everyday since 2014.

The network is made of two layers of bidirectionnal LSTM units with a 20 dense at the end in order to predict the next 20 values of the time serie.

LSTMs are great at learning from long-term dependencies on sequences of data, when made bidirectionnal they also train on a reversed copy of the input sequence, this can provide additional context to the network and result in faster and even fuller learning on the problem.

Sadly the forecasts are quite imprecise, this is because there is a lot of noises and actions or cryptocurrencies price depends on a lot of factors that a time serie alone can not represent.

Though the model did manage to learn a rising trend and is not totally wrong.

Project 3 - BTC and ETH forecasting with multivariate time series

This third project aims to forecast the price of cryptocurrencies using all the features in the dataset, and combining two cryptocurrencies to see if the results are better.

The LSTM overfits so quickly that we need to set the number of epochs to 12 (batch size also influence a lot the number of epochs required), furthermore if the number of features is too big we need to set the loss to a MSLE (mean squared logarithmic error) so that big differences count as much as little ones.

But nope this is still too bad.

About

In this repository you'll learn how to analyse a time serie and forecast it's values using ARIMA, LSTM and multivariate time series.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published