-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strong dependence on using kmeans background samples for SHAP #1
Comments
Hi Scott, excited to have you engage with this project! The thing here is that the adversarial attack is being trained with kmeans in mind. So if you switch to a different sampling strategy (random sample), it's likely the attack won't work as well because the OOD/IID detection is looking for substitutions that came from kmeans sampling and wouldn't be found in the data distribution. It's worth verifying that random sampling alone is enough to break the attacks however. It could be the case that random sampling produces enough IID perturbations that the attacks are ineffective. But, when I run with random sampling for COMPAS, race shows up as the top feature 16% of the time while the uncorrelated column shows up 83% of the time. It's worth noting that fidelity falls to 83%. Additionally, I bumped up the adversarial classifier training to 4e4 draws from the data provided. Explicitly my changes were:
I got an error when I called Further, a clever attacker could train the attack with a few plausible sampling strategies (sample, kmeans, zeros, etc.) to make the attacks resilient to sampling changes. So long as substitutions SHAP produces aren't found in the data distribution, this attack is plausible to some degree. Please let me know if you have any other questions! |
Thanks for getting back to me Dylan! It looks like it all has to do with how many samples we give the adversarial classifier. The default number 2e4 doesn't seem to work very well with a random background, but adding 4e4 helps and 4e5 makes things work like in the paper. This is what I did: One line 2 of the SHAP experiment box I made your change:
Then I computed the results right after that as:
Which produced:
Then I increased the number of samples to So I looped over many sample sizes and got a general trend...but fidelity also drops (note the left side of this plot is the default # of samples in the code right now). and I am not sure how good 0.7-0.8 fidelity is since 0.5 is random (thoughts?). Since % match is hard to interpret I reran this again using R-squared for fidelity, and it looks like we need to drop to about R^2 = 0.5 to consistently knock race out of the top feature importance rank: |
This is cool! It definitely depends on the number of samples you're considering, because this is just the number of substitutions you're training on --- we'd expect at better ood classifier if we see more combinations. The significance of the fidelity number really depends on the target application of the attack IMO. We were thinking about this in terms of presenting a biased classifier in the real world while going undetected. If we're saying that sacrificing 20-30% fidelity allows us to go "undetected" then we're still being perfectly discriminatory on 70-80% of our population. Assessing this from a demographic parity perspective, on the 4e-4 classifier I'm seeing 20% DP, which is pretty discriminatory. I could see this being less meaningful in other applications though. |
Makes sense. Though I would caution that you are not being "perfectly discriminatory" on 70-80% of the population, since if it was 50% it would be random guessing and hence could have no discriminatory effects. That's why I think R^2 is much easier to interpret since it is the % of variance you are able to predict. To your point of being "undetected", I think the actual importance given to the feature is probably more informative than "it was the top feature". To that end I re-ran things (with a bit different range of sample sizes) and it looks like the actual weight of the race feature is dropping by about a factor of 2 (which is also roughly how much the R2 drops by, so it might not be doing much more than moving weight from one feature to another): If we run the same experiment with the kmeans background then we get a much more effective attack (though still at a cost to fidelity). Which seems to imply that at least for the COMPAS example kmeans is important for this attack: (note the left end of these plots start at n_samples = 1000) |
Yeah, I do think you raise a good point here about the distinction between using kmeans and random sampling. KernelSHAP does seem to be more robust given this analysis and data set! The only other thing I'd consider here is that by increasing the n_samples, the IID/OOD data set is starting to become pretty unbalanced, given that there's only ~5,500 COMPAS training instances I believe. It's setup right now to address this by resampling the original instances (set through the through the |
Hey! I finally got around to playing with the examples you have here, and I noticed that you were using
shap.kmeans
to get the background data. Since I typically use a random sample not kmeans (unless I am trying to really trying to play with run time optimization), I just swappedfor
When I did this all the adversarial results for SHAP seemed to fall apart for COMPAS...meaning 79% of the time race is still the top SHAP feature in the test dataset for the adversarial model.
This very strong dependence on using kmeans was surprising to me, since it seems to imply SHAP is much more robust to these adversarial attacks when using a typical random background sample. Have you noticed this before, or do you have any thoughts on this? I think it is worth pointing out, but I wanted to get your feedback before suggesting to users that a random sample provides better adversarial robustness.
Thanks!
The text was updated successfully, but these errors were encountered: