Modeling Stock Market Returns using Statistical, Classical Machine Learning and Deep Learning Methods
CNNPred Dataset This dataset contains several daily features of S&P 500, NASDAQ Composite, Dow Jones Industrial Average, RUSSELL 2000, and NYSE Composite from 2010 to 2017.
-
Widely used for prediction in time series data modeling, ARMA requires the time series to be stationary; which means a constant mean, constant variance and non-seasonal. Since, stock market prices don’t have constant mean, a tranformed feature yi = pricei − pricei−d where d is the lag. Auto-Regressive(AR) and Moving Average(MA) models are parameterized by auto-correlation coefficient(p) and Partial auto-correlation coefficient(q). Search is performed given the maximum values of p and q to find the best model.
-
- An RMSE of 423.24 was observed.However, if the first testing point is included in the training data to predict the next point, an RMSE of 49.72 was observed.
-
-
This model has been given in
https://jfin-swufe.springeropen.com/articles/10.1186/s40854-019-0138-0
for forecasting daily return direction of the SPDR S&P 500 ETF index. All of the 60 financial variables evaluated in the study’s dataset are already present in & analogous to the 82 features of the chosen dataset, CNNPred. Zhong and Enke explored multiple data transformation techniques including PCA and its variants, fuzzy robust principal component analysis (FR- PCA) and kernel-based principal component analysis (KPCA), among others. Their results showed that traditional PCA outperformed all non-linear techniques on real-world data. Thus, PCA is chosen as the data transformation technique and PCA represented dataset with 82 principal components is used.-
- A classical statistical principle is used for detection of outliers based on inter-quartile ranges. These outliers are accordingly adjusted similar to a method used in
https://link.springer.com/article/10.1007 s005210170010
. The cleaned data is split in 70/15/15 ratio for train, validation and test dataset respectively. Data is standardized with the mean and variance of the training dataset.
- A classical statistical principle is used for detection of outliers based on inter-quartile ranges. These outliers are accordingly adjusted similar to a method used in
-
- PCA-represented dataset is classified using neural network comprising of 4 layers with RELU activation & Sigmoid activation for the last layer. Dropout has been introduced in the network to avoid overfitting. Binary cross entropy loss is used as the loss criterion. Initial learning rate set to 0.0001 with ADAM optimizer for training over maximum of 100 epochs.
-
- Obtained an accuracy of 0.559 and a F1-score of 0.656 on the test set.
-
-
This model given in
https://www.sciencedirect.com/science/article/abs/pii/S0957417419301915
classifies the change in the closing price of the market using only the data for the market under analysis. It takes the input of all the features from last 60 days and leverages 2D convolution filters for making feature maps and finally classifying the change in price. Since CNNs are good at capturing short range data and hierarchically extract features from day wise data, they serve as a good method for predicting stock prices. The initial learning rate for training this network was set to 0.001, and the ADAM optimizer was used for training over 100 epochs. Since the network was quite small with only one fully connected layer and three convolution layers, a weight decay of 0.0001 was sufficient to regularize the network. Each feature was normalized according to the training set and was used during the validation and test time.-
-
F-Scores on the test set using CNN-Pred2D trained for 100 Epochs on the CNNPred Stock market Dataset are given below:
Market Name Average Maximum NASDAQ
0.438
0.552
NYSE
0.491
0.492
S&P 500
0.434
0.573
Russell
0.497
0.691
DJI
0.471
0.696
-
-
-
This model classifies the change in the closing price of the market using the data of various markets. It takes the input as a 3D block of all the features from the 5 markets in the dataset over the last 60 days and leverages 3D convolution filters for making feature maps and finally classifying the change in price by using data across many markets. The usage of 3D convolution filters is the same as 2D convolution filters only that these filters hierarchically extract features and are temporally sensitive over an additional dimension. Similar to the CNN-Pred2D, the initial learning rate for training this network was set to 0.001, and the ADAM optimizer was used for training over 100 epochs with a weight decay of 0.0001. Each feature was normalized according to the training set and was used during the validation and test time.
-
-
F-Scores on the test set using CNN-Pred3D trained for 100 Epochs on the CNNPred Stock market Dataset are given below:
Market Name Average Maximum NASDAQ
0.49
0.536
NYSE
0.432
0.566
S&P 500
0.491
0.67
Russell
0.489
0.521
DJI
0.486
0.5
-
-
-
This model given in
https://www.hindawi.com/journals/complexity/2020/6622927
considers learning sequence aware features as the stock market is an event which moves in the temporal dimension, it is difficult to ignore the sequential information present in the latent embeddings for the downstream tasks. This work explored the price prediction perspective for more informed stock market trading and hence was a regression task. The convolutional features were used to calculate the temporal features over time stamps and then an LSTM was used to capture the sequential features. The structure of LSTM is designed in such a manner that it works on selectively learning which information to hide and which to infer on over a certain time step and pass both the states for the next time step. This overparameterization leads to a delayed stability of LSTMs in terms of metrics but provably results to a more optimal result in lesser number of iterations. This model gave sufficiently good results on predicting the closing price of a market on training with 100 epochs with a learning rate of 0.001 on ADAM optimizer, with weight decay of 0.0001.-
-
Regression metrics on the test set using CNN-LSTM model trained for 100 Epochs on the CNNPred Stock market Dataset are given below:
Market Name MAE RMSE R2 Max Closing Price NASDAQ
645.21
854.89
0.99
341682.12
NYSE
433.31
577.70
0.99
388508.06
S&P 500
143.58
192.79
0.99
24420.32
Russell
162.90
205.95
0.99
12414.37
DJI
1899.07
2551.76
0.99
2980330.8
-
-
-
This model uses the same inputs as the CNNPred Models with the same feature normalization performed. Used implementation of the forward and backward pass for ODE solving and made a custom feature map extraction architecture with the ODE solving as a module taking the feature map as the input. The feature map extractor is a simple 2D CNN architecture which takes in the 3D input, as represented by an instance. Achieved faster training converging to a stable F-score in 15 epochs as compared to the 100 epochs of CNNPred2D and CNNPred3D. Got improved classification results on training this classifier on both CNNPred Dataset and CNNPred3D Dataset. The optimizer used was ADAM with a learning rate of 0.001 for 15 epochs.
-
-
F-Scores on the test set using infinite depth classifier using NeuralODE trained on the CNNPred Stock market Dataset on single market data as input are gien below:
Market Name Average Maximum NASDAQ
0.48
0.579
NYSE
0.531
0.567
S&P 500
0.542
0.622
Russell
0.517
0.584
DJI
0.519
0.589
-
F-Scores on the test set using infinite depth classifier using Neural ODE trained for 15 Epochs on the CNNPred3D Stock market Dataset on all the market data as input are given below:
Market Name Average Maximum NASDAQ
0.438
0.55
NYSE
0.567
0.624
S&P 500
0.541
0.556
Russell
0.491
0.495
DJI
0.521
0.527
-
-
-
Used GRU encoder to encode the sequence to a hidden state and Neural ODE as a decoder. The encoder-decoder networks were trained on a data of 60 days and during inference time, data of 59 consecutive days were given in order for the model to predict the closing price for the 60th day while ODE solving the ODE decoder using Euler’s method. Only used two dimensional data consisting of the generated time-steps and the closing price on which we have done the training and made the inferences. Used a latent dimension of 4 for the GRU and trained for 100 epochs using the ADAM optimizer with learning rate of 0.001. Although inferior results were achieved as compared to CNN-LSTM, concluded from this set of experiments that the data was not sufficient in training the ODE-VAE model as generative model such as VAEs require a lot of training samples. Furthermore, also made the ODE on the closing price as that was the only available data in the dataset. Prior Knowledge of the exact opening, closing, and median prices would have been a more ideal setting in training this model.
-
-
Regression metrics on the test set using ODE VAE model trained for 150 Epochs on the sequential closing price data of the CNNPred Stock market Dataset are given below:
Market Name MAE RMSE Max Closing Price NASDAQ
2177.418
2251.247
341682.12
NYSE
2461.052
2538.778
388508.06
S&P 500
799.763
815.964
24420.32
Russell
1350.78
1355.90
12414.37
DJI
7288.813
7455.721
2980330.8
-
-
-
The GAN model builds upon the idea to predict PCA transformed features as utilized previously and is tested on the S&P 500 index. Initially data is preprocessed with outlier adjustments as similar to and then the data is split in 80/20 ratio for train and test set respectively. Also, data is standardized with the mean and variance of the training dataset. The generator utilizes a shallow LSTM single layer network followed by a fully connected layers comprising of Tanh and ReLU activations along with batch normalization. The generator’s aim is to take features of past 60 days and generate the new features of the 61st day. The discriminator is a deep CNN network that takes in features of 61 days and predicts whether the this trend is real or fake. The loss function used to train the discriminator is BCE Loss. Finally, used the output of the generator to predict the direction of the closing price of the index.
-
- The accuracy on the train set and the test set were observed to be 0.545 & 0.504 respectively.
-