Original data stored in interpret explainer classes #368

epetrovski · 2021-01-20T16:08:03Z

interpret is a very useful package for explaining ML using SHAP, thanks. But I have a legal issue that prohibits me from using this in a professional context.

It seems that the explainer classes contain original datasets in obscure places. For instance, if I fit explainer = TabularExplainer(model, data) I end up with all my original data in explainer.explainer.initialization_examples.original_dataset.

This is a fact that I think most users are simply unaware of and a big issue for professionals, like me, working under a GDPR regime. If asked, I need to be able to tell regulators exactly where my costumer's data is stored, and that answer should always be in a centralized and protected database and not hidden away in some python object that ends up getting uploaded to Azure ML Workshop or pickled and saved to a disk.

So my question is whether it is strictly necessary for interpret's explainer models to store the original data they were initialized on? If not, could you commit to stripping original data from explainer classes?

The text was updated successfully, but these errors were encountered:

interpret-ml · 2021-01-20T22:51:13Z

Hi @epetrovski -- It seems you are using the interpret-community package because TabularExplainer is a class that only exists there. Transferring the issue to them for further response.

-InterpretML team

gaugup · 2021-01-21T15:05:06Z

@epetrovski thanks for raising the privacy concern here. I don't see the code in interpret-community where the customer's data is being cached in TabularExplainer. Could you maybe provide a code sample where we can see the caching of the raw dataset?

I looked at the code for TabularExplainer. My hunch is perhaps shap explainers cache the raw dataset which is something we don't control. Just a hunch. More may become clearer once you suply with the code sample.

Regards

imatiach-msft · 2021-01-21T15:14:46Z

@gaugup it is cached in the individual explainers (eg mimic explainer, see:

interpret-community/python/interpret_community/mimic/mimic_explainer.py

Line 305 in 0fff380

self.initialization_examples = initialization_examples

), it is used to put it on the explanation object, (eg see

interpret-community/python/interpret_community/mimic/mimic_explainer.py

Line 463 in 0fff380

kwargs[ExplainParams.INIT_DATA] = self.initialization_examples

).
Maybe we can add an option to remove it. However, without some data the visualization dashboard won't be useful at all. So I'm not sure what @epetrovski is suggesting we should do - since without the original dataset the explanation isn't very useful to the user. This is more of a PM question - maybe our PMs could take a look at this issue?

epetrovski · 2021-01-22T08:28:32Z

Maybe we can add an option to remove it. However, without some data the visualization dashboard won't be useful at all. So I'm not sure what @epetrovski is suggesting we should do - since without the original dataset the explanation isn't very useful to the user. This is more of a PM question - maybe our PMs could take a look at this issue?

Couldn't you simply ask users to supply the entire dataset at the initialization of the dashboard in stead of caching all the data upfront before you even know whether the user is going to use a dashboard at all.

interpret-ml transferred this issue from interpretml/interpret Jan 20, 2021

imatiach-msft mentioned this issue Apr 8, 2021

initial shell implementation of the raitools package microsoft/responsible-ai-toolbox#427

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Original data stored in interpret explainer classes #368

Original data stored in interpret explainer classes #368

epetrovski commented Jan 20, 2021

interpret-ml commented Jan 20, 2021

gaugup commented Jan 21, 2021

imatiach-msft commented Jan 21, 2021

epetrovski commented Jan 22, 2021

Original data stored in interpret explainer classes #368

Original data stored in interpret explainer classes #368

Comments

epetrovski commented Jan 20, 2021

interpret-ml commented Jan 20, 2021

gaugup commented Jan 21, 2021

imatiach-msft commented Jan 21, 2021

epetrovski commented Jan 22, 2021