-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[userbenchmark] Broader error catching in Torch-TRT userbenchmark
#1974
Conversation
- Failures on a recent sample run indicate that errors raised by the subprocess are not always Exceptions, but are still sometimes recoverable - Add `except` clause to catch all such errors, short of keyboard interrupts, so the compilation can complete despite such errors
cc34f3b
to
a609231
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xuzhao9 - thanks! If possible, could you start another benchmark run with this new change? |
Hi @xuzhao9 - I am wondering how we might be able to set up a weekly CI run of this userbenchmark? I noticed some of the other benchmarks have a platform: "gcp_a100"
schedule: "weekly" |
@gs-olive Currently, it is already running nightly: https://github.com/pytorch/benchmark/blob/main/userbenchmark/torch_trt/ci.yaml#L2 Changing it to weekly will work ootb - but we need to fix the existing CI errors first: https://github.com/pytorch/benchmark/actions/runs/6518636322/job/17704274032 |
I see - thanks for sharing this, it is likely because I am saving the error message as a string in the JSON dictionary when compilation or model building fails. Should I not save an entry at all in such situations instead? |
Yes, the metric value field in the JSON dictionary accepts float only. We suggest using a separate file or multiple separate files to save the complete error message for readability. In the JSON dictionary, if a metric fails, we suggest to use a special float value (e.g., -1.0) to indicate an error or skip having this metric Id. |
Thanks for the information! I have added the necessary changes to fix the string-values in the dictionary, here: #1998 |
except
clause to catch all such errors, short of keyboard interrupts, so the compilation can complete despite such errors