Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Causality Test #78

Open
ClashLuke opened this issue Sep 11, 2022 · 1 comment
Open

Causality Test #78

ClashLuke opened this issue Sep 11, 2022 · 1 comment
Labels
core Improves core model while keeping core idea intact engineering Software-engineering problems that don't require ML-Expertise

Comments

@ClashLuke
Copy link
Member

Currently, we have to manually verify that a modification doesn't accidentally leak information, which can be prone to errors. Especially in situations where only some tokens can see other future tokens, this can be difficult to notice using only the loss curves. That's why we should need to introduce a test that ensures our model cannot future tokens, as that'd make it much easier to predict future tokens.

@ClashLuke ClashLuke added engineering Software-engineering problems that don't require ML-Expertise core Improves core model while keeping core idea intact labels Sep 11, 2022
@ClashLuke
Copy link
Member Author

My current best approach would be to initialize a regular model (or optionally layer for unit tests), compute the forward pass, and backpropagate through the loss at one specific position rather than the mean. This way, we can look at the input's gradients and see if any future tokens have a gradient != 0. In a separate test, we could also check if at least one of the past tokens has a gradient != 0 to ensure that the model even looks at the input.
The issue with this approach is that leaks can be difficult to isolate, and we've already had it multiple times that a leak occurred for a few tokens to other singular tokens but not from all to all. That's why we would need context_size separate backward passes, which can get expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Improves core model while keeping core idea intact engineering Software-engineering problems that don't require ML-Expertise
Projects
None yet
Development

No branches or pull requests

1 participant