This Jupyter Notebook provides an analysis to see if financial indicators that are present in 10-K filings of public companies can be used to predict the stock performance at the end of the year. We compare the results using different machine learning models and techniques.
200+ Financial Indicators of US stocks
This analysis is a Jupyter Notebook that makes use of the following Python libraries:
- Pandas
- Numpy
- Autogluon tablular:
TabularPredictor
,TabularDataset
- imblearn
RandomOverSampler
- sklearn model_selection functions such as
train_test_split
- sklearn preprocessing:
OneHotEncoder
,StandardScaler
,MinMaxScaler
- sklearn
SimpleImputer
- sklearn metrics:
confusion_matrix
,classification_report
- sklearn ensemble models:
RandomForestClassifier
,AdaBoostRegressor
- sklearn svm
- sklearn linear model:
LogisticRegression
- Tensorflow 2.0
- Keras
To use this notebook:
- Install Jupyter Lab Version 2.3.1 and Python 3.7.
- Install Autogluon
- Install imbalanced-learn
- Install scikit-learn
- Install Tensorflow (Keras is included in Tensorflow 2.0)
Open the notebook in Jupyter Lab and you can rerun the analysis.
Our dataset contains feature data in the form of financial indicators that are present in 10-K filings of public companies. We separated our feature dataset into "raw" and "key" datasets, where the raw data contained all the fields from the original dataset, and the "key" dataset contained 6 key financial values/ratios.
Our dataset contains a 2019 PRICE VAR [%]
that gives the percent price variation for the year (meaning from the first trading day to the last trading day for that year). This column is used for regression modeling.
Our dataset also contains a Class
column which is used for classification. For each stock, if the PRICE VAR [%]
value is positive, class is 1 and the stock went up for the year. If the PRICE VAR [%]
value is negative, class is 0, and the stock went down for the year.
Because we are predicting stock prices, precision will be more important than recall because the cost of acting is high, but the cost of not acting is low.
Furthermore, we expect to act only on Class 1, which represents stocks that go up. So we optimized our classification modeling for Precision of Class 1, meaning it is most important that when we predict that a stock is one that should go up, we are correct.
- We tried various strategies for how much missing/invalid data we discarded.
- We tried replacing zeros with the mean for each column, and it had subtle/questionable effects. Later, when we tried it again after some of our other changes, it had almost no effect on most of our models, except for a noticeably bad effect on our Tensorflow models that used the raw dataset.
- We tried using the experimental scikit-learn
IterativeImputer
to impute NaN values. We tried it with various estimators:- BayesianRidge: regularized linear regression
- DecisionTreeRegressor: non-linear regression
- KNeighborsRegressor: comparable to other KNN imputation approaches
- The first two models had trouble converging, and the K-neareset neigbors estimator produced results similar to just using the column mean. So we stuck with the
SimpleImputer
using the column mean.
- We tried dropping some of the redundant columns but in general when we saw our Class 1 accuracy go down for multiple models, so we left in all the columns from the raw dataset.
- We also tried scaling data to be between 0 and 1 using the scikit-learn
MinMaxScaler
. This caused some of our models to only predict Class 1, but it had a positive effect on the Tensorflow models, especially with the raw dataset, so we used the 0 to 1 scaled data for those models.
- Removing rows and columns with over 25% NaN values.
- Removing rows with over 25% zeros and columns with over 50% zeros.
- Encoding categorical data using one-hot encoding.
- Replacing remaining NaN values by imputing them with the mean for each column.
- Scaling the feature data by removing the mean and scaling to unit variance.
- Scaling feature data again for another dataset, scaling it to be in the range of 0 to 1. Using this dataset for Tensorflow models with the raw dataset.
Autogluon was used as a baseline model. Because Autogluon has its own strategies for dealing with NaN data, we used the financial dataframe where rows and columns with too many NaNs and zeros were removed, but no replacement of NaN values or encoding of categorical data. In general AutoGluon handled the cleaning, encoding, creating and fitting of multiple models.
precision recall f1-score support
0 0.60 0.36 0.45 280
1 0.77 0.90 0.83 674
accuracy 0.74 954
macro avg 0.69 0.63 0.64 954
weighted avg 0.72 0.74 0.72 954
precision recall f1-score support
0 0.54 0.14 0.22 280
1 0.73 0.95 0.82 674
accuracy 0.71 954
macro avg 0.63 0.55 0.52 954
weighted avg 0.67 0.71 0.65 954
For oversampling we used the imblearn RandomOverSampler
.
precision recall f1-score support
0 0.46 0.72 0.56 280
1 0.85 0.64 0.73 674
accuracy 0.67 954
macro avg 0.65 0.68 0.64 954
weighted avg 0.73 0.67 0.68 954
precision recall f1-score support
0 0.67 0.01 0.01 280
1 0.71 1.00 0.83 674
accuracy 0.71 954
macro avg 0.69 0.50 0.42 954
weighted avg 0.70 0.71 0.59 954
precision recall f1-score support
0 0.62 0.38 0.47 280
1 0.78 0.91 0.84 674
accuracy 0.75 954
macro avg 0.70 0.64 0.65 954
weighted avg 0.73 0.75 0.73 954
precision recall f1-score support
0 0.58 0.47 0.52 280
1 0.80 0.86 0.83 674
accuracy 0.75 954
macro avg 0.69 0.66 0.67 954
weighted avg 0.73 0.75 0.74 954
precision recall f1-score support
0 0.67 0.01 0.01 280
1 0.71 1.00 0.83 674
accuracy 0.71 954
macro avg 0.69 0.50 0.42 954
weighted avg 0.70 0.71 0.59 954
Coefficient of determination of the prediction: -0.57193532905021
Coefficient of determination of the prediction: -1.6260137253640794
precision recall f1-score support
0 0.56 0.25 0.35 280
1 0.75 0.92 0.82 674
accuracy 0.72 954
macro avg 0.66 0.58 0.59 954
weighted avg 0.69 0.72 0.68 954
precision recall f1-score support
0 0.41 0.66 0.51 280
1 0.81 0.61 0.69 674
accuracy 0.62 954
macro avg 0.61 0.63 0.60 954
weighted avg 0.69 0.62 0.64 954
precision recall f1-score support
0 0.43 0.69 0.53 280
1 0.83 0.62 0.71 674
accuracy 0.64 954
macro avg 0.63 0.65 0.62 954
weighted avg 0.71 0.64 0.66 954
Classification Shallow Neural Network using raw dataset scaled between 0 and 1, with oversampling of minority class
precision recall f1-score support
0 0.42 0.81 0.55 280
1 0.87 0.53 0.66 674
accuracy 0.61 954
macro avg 0.64 0.67 0.60 954
weighted avg 0.74 0.61 0.63 954
Classification Deep Neural Network using raw dataset scaled between 0 and 1, with oversampling of minority class
precision recall f1-score support
0 0.43 0.76 0.55 280
1 0.85 0.58 0.69 674
accuracy 0.63 954
macro avg 0.64 0.67 0.62 954
weighted avg 0.73 0.63 0.65 954
Although several of our models peformed fairly well above 50% Precision for Class 1, the model that did the best was our Shallow Neural Network classification model that used the raw dataset scaled between 0 and 1, and used oversampling of the minority class.
This model gave a Precision for Class 1 of 0.87, meaning if we use this model to predict that a stock is expected to go up in a year, there is an 87% chance that our classification was correct. However, when we re-run this model, sometimes our Class 1 Precision varies slightly. We set the numpy and Tensorflow seeds to minimize this, which resulted in variation of only 1-2%.
Our second best model was an SVM SVC classification model that used the raw dataset scaled by removing the mean and scaling to unit variance, again with oversampling of minority class. This model gave a Precision for Class 1 of 0.85. When re-running the SVM SVC model fitting, it gave more consistent results than the Tensorflow models.
Autogluon provided the most automated solution, as we could essentially put the raw data into it and it handled the cleaning, encoding, creating and fitting of multiple models. The Autogluon model gave a Precision for Class 1 of 0.77.
We were not able to get any highly accurate regression models, and next steps for our project are to look into regression modeling more as it might provide interesting insights into the performance of individual stocks that went up or down, since binary classification marks stocks as either "stocks to buy" or "stocks not to buy" without any nuance.
-
Allen Wong
-
Christine Guo
-
LaNaya Johnson
-
Michael Danenberg
MIT