Understanding the fundamentals of a decision-making process is, for most purposes, an essential step in the field of machine learning. In this context, the analysis of predefined groups of features can provide important indications for comprehending and improving the prediction. This repository extend the univariate permutation importance to a grouped version for evaluating the influence of whole feature subsets in a machine learning model. This is done by a slight modification of the permutation importance of scikit-learn.
Install via pip
pip install git+https://github.com/lucasplagwitz/grouped_permutation_importance
from grouped_permutation_importance import grouped_permutation_importance
data = load_breast_cancer()
feature_names = data["feature_names"].tolist()
X, y = data["data"], data["target"]
idxs = []
columns = ["mean", "error", "worst"]
for key in columns:
idxs.append([x for (x, y) in enumerate(feature_names) if key in y])
cv = RepeatedStratifiedKFold()
pipe = Pipeline([("MinMax", MinMaxScaler()), ("SVC", SVC())])
r = grouped_permutation_importance(pipe, X, y, idxs=idxs, n_repeats=50, random_state=0,
scoring="balanced_accuracy", n_jobs=5, cv=cv,
perm_set="test")
In the file "examples/make_class.py" a small simulation is shown to verify correctness. Based on scikit-learns make_classification method, different informative subsets are analyzed.
The file "examples/brain_atlas.py" demonstrates a neuroimaging example for rating brain regions depending on the target variable (age, CDR, biological sex).
If you use the Grouped Permutation Importance in a scientific publication, we would appreciate citations to the following paper:
Lucas Plagwitz, Alexander Brenner, Michael Fujarski, and Julian Varghese. Supporting AI-Explainability by Analyzing Feature Subsets in a Machine Learning Model.
Studies in Health Technology and Informatics, Volume 294: Challenges of Trustable AI and Added-Value on Health. doi:10.3233/SHTI220406