Skip to content

The code written to develop the streamlit interactive tool used for understanding final project of course CS_3924

Notifications You must be signed in to change notification settings

Yanyu-Chen1010/CS_3924_Final_Yanyu.Chen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Explore the Stability of Shap Explained Feature Importance

The code written to develop the streamlit interactive tool used for understanding final project of course CS_3924 (Track 2)


My project is inspired by works:

Jabeur, et.al. "Forecasting gold price with the XGBoost algorithm and SHAP interaction values"

Kumar, et.al. "Problems with Shapley- value-based explanations as feature importance measures"

Barr, et.al. "Towards Ground Truth Explainability on Tabular Data" CapitalOne-Github: "Create Synthetic Data by correlation matrix"

Renisha Chainani, "FACTORS INFLUENCING GOLD PRICES". (2016)

Correlation Matrix Heatmap Plot


Jabeur and kumar's works motivate me to test the hypothesis:

How much does the shap_explained feature attribution change given:

H1: when interventional correlation between features change

H2: when fully redundance being added to train data

H3: when the balance structure of train data change


Folder "sample_data" includes codes help generate clean original data of raw data collected, and folder "Interactive_tool" include files used to mathematically processs the orginal data to obatin different kind of synthetic data. The file "web_data_upload.py" in folder "Interactive_tool" will give us the Streamlit Interactive Website, where users can input cleaned orginal data to test how the SHAP explained feature importance value change with different synthetic data of various data structure and characteristics.


The source of collecting raw features data to generate "Original Data of Gold Price" in folder sample data are:

Gold Price

Metal Commodities' Price and Dollar Index

Geopolotical and Economic Policy Uncertainty indicators


The source of collecting raw features data to generate "Original Data of census income" in folder sample data is:

UCI Census-Income (KDD) Data Set

About

The code written to develop the streamlit interactive tool used for understanding final project of course CS_3924

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published