-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Original data stored in interpret explainer classes #368
Comments
Hi @epetrovski -- It seems you are using the interpret-community package because TabularExplainer is a class that only exists there. Transferring the issue to them for further response. -InterpretML team |
@epetrovski thanks for raising the privacy concern here. I don't see the code in interpret-community where the customer's data is being cached in TabularExplainer. Could you maybe provide a code sample where we can see the caching of the raw dataset? I looked at the code for TabularExplainer. My hunch is perhaps shap explainers cache the raw dataset which is something we don't control. Just a hunch. More may become clearer once you suply with the code sample. Regards |
@gaugup it is cached in the individual explainers (eg mimic explainer, see:
Maybe we can add an option to remove it. However, without some data the visualization dashboard won't be useful at all. So I'm not sure what @epetrovski is suggesting we should do - since without the original dataset the explanation isn't very useful to the user. This is more of a PM question - maybe our PMs could take a look at this issue? |
Couldn't you simply ask users to supply the entire dataset at the initialization of the dashboard in stead of caching all the data upfront before you even know whether the user is going to use a dashboard at all. |
interpret
is a very useful package for explaining ML using SHAP, thanks. But I have a legal issue that prohibits me from using this in a professional context.It seems that the explainer classes contain original datasets in obscure places. For instance, if I fit
explainer = TabularExplainer(model, data)
I end up with all my original data inexplainer.explainer.initialization_examples.original_dataset
.This is a fact that I think most users are simply unaware of and a big issue for professionals, like me, working under a GDPR regime. If asked, I need to be able to tell regulators exactly where my costumer's data is stored, and that answer should always be in a centralized and protected database and not hidden away in some python object that ends up getting uploaded to Azure ML Workshop or pickled and saved to a disk.
So my question is whether it is strictly necessary for interpret's explainer models to store the original data they were initialized on? If not, could you commit to stripping original data from explainer classes?
The text was updated successfully, but these errors were encountered: