Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow graceful failover when one tunable panics #74

Merged
merged 4 commits into from
Aug 20, 2024

Conversation

wingertge
Copy link
Contributor

@wingertge wingertge commented Aug 17, 2024

Catches panics that occur during the benchmarking phase of autotuning and gracefully falls back to the next algorithm in line.

This allows selecting algorithms based on things like memory constraints (without adding Result types all through the memory allocation chain) or even things like potentially unsupported GPU features. Note that this only catches panics during benchmarking, not during regular execution. If an algorithm fails at that point it will still give error feedback to the user.
If all tunables panic, propagates the first panic.

One thing worth mentioning is that while it will gracefully continue to the next algorithm, it will not necessarily continue quietly. The error message still gets printed to the console, which could confuse users. Unfortunately I don't think that can be prevented due to how panic is implemented, the print gets output before unwind gets to the benchmark code where the unwind is caught. This can be prevented with a custom panic hook, but that would affect all panics everywhere, which is a bit too intrusive.

Testing

All existing tuners work as expected with all tests passing, and the fallback works for im2col convolution when called with parameters that cause an out of memory error.

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, and yes I think it makes a ton of sense to have this into autotune! I only have one comment.

crates/cubecl-runtime/src/tune/tuner.rs Show resolved Hide resolved
@nathanielsimard nathanielsimard merged commit 23f0d33 into tracel-ai:main Aug 20, 2024
1 of 2 checks passed
@wingertge wingertge deleted the fallible-tune branch August 20, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants