Explainable Machine Learning Exploiting News and Domain-specific Lexicon for Stock Market Forecasting

0. Content

This repository contains the code, the data and the resources to execute the algorithm described in the paper "Explainable Machine Learning Exploiting News and Domain-specific Lexicon for Stock Market Forecasting" (S. Carta, S. Consoli, L. Piras, A.S. Podda, D. Reforgiato Recupero).

Here follows a short description of the content of the repository:

companies_by_industry: mappings from company codes used in SP500 dataset to codes used in DNA dataset; the companies are grouped by the 3 industries analized in the paper (namely, Information Technology, Financial and Industrials).
data: limited example news published in 2020 about the 3 industries, respectively.
lexicons: used when creating new lexicons. Each subfolder inside 2 classes is named according to the parameters passed to create the lexicons; similarly, a function in create_lexicons.py uses the same naming system to retrieve the lexicons based on the parameters. Each subfolder contains a list of csv files, each representing the lexicon created for the day indicated in the file name. Read documentation in "create_lexicons.py" for further reference on the saving system
samples: ready-to-use feature vectors, computed on the whole portion of DNA studied in the paper.
create_lexicons.py: functions used for creating new lexicons and for fetching previously created lexicons
evaluation.py: accuracy metrics and other utility functions used to evaluate the algorithm
explainable_ml.py: main script, containing the backbone of the algorithm.
utils.py: other utility functions.

Please note that, due to licensing constraints, we cannot publish any news document contained in the Dow Jones' Data, News and Analytics dataset, which was employed in the experimental framework of the paper. However, in order to allow the reproducibility of the experiments illustrated in the paper, we hereby make available a set of ready-to-use lexicons and feature vectors, based on the original data.

1. MongoDB configuration

Install mongodb community edition: https://www.mongodb.com/try/download/community We recommend also installing Robo3T, to manage databases and collections from graphic interface: https://robomongo.org/download

Once the installation is complete, create a database called "financial_forecast" (this operation is very easy from Robo3T interface).

Now let's create mongodb collections for news.

Go to directory Explainable-ML/data
run command >

mongoimport --db financial_forecast --collection Information Technology_news --file Information Technology_news_example.json --legacy

Do the same for the other two files. Please pay attention to the fact that the names of the mongodb collections must match the strings inserted in the code.

2. Python modules installation

In order to run the code, you must have the following Python modules installed and updated:

sklearn : https://scikit-learn.org/stable/install.html
numpy : https://numpy.org/install/
pymongo : https://pymongo.readthedocs.io/en/stable/installation.html

3. Executing the code

The main script to run is explainable_ml.py. You can simply run this command from shell: Explainable-ML > python event_detector.py

Please note that this script is configured to run with feature vectors that were pre-computed on the whole portion of dataset analyzed in the paper. If you intend to run the algorithm on personalized lexicons with custom parameters, you will need to use your own news dataset and adapt the code accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
companies_per_industry		companies_per_industry
lexicons/2 classes		lexicons/2 classes
samples		samples
README.md		README.md
create_lexicons.py		create_lexicons.py
evaluation.py		evaluation.py
explainable_ml.py		explainable_ml.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explainable Machine Learning Exploiting News and Domain-specific Lexicon for Stock Market Forecasting

0. Content

1. MongoDB configuration

2. Python modules installation

3. Executing the code

About

Releases

Packages

Languages

Artificial-Intelligence-Big-Data-Lab/Explainable-ML

Folders and files

Latest commit

History

Repository files navigation

Explainable Machine Learning Exploiting News and Domain-specific Lexicon for Stock Market Forecasting

0. Content

1. MongoDB configuration

2. Python modules installation

3. Executing the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages