Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report non-fatal errors, detect hangs #61

Open
mhofman opened this issue Feb 7, 2022 · 0 comments
Open

Report non-fatal errors, detect hangs #61

mhofman opened this issue Feb 7, 2022 · 0 comments
Assignees

Comments

@mhofman
Copy link
Member

mhofman commented Feb 7, 2022

#30 and Agoric/agoric-sdk#4114 are both examples of non-fatal errors that get printed on the output of the solo or chain, but do not prevent the load cycles from completing. These are still believed to be regressions however, and it's be good to surface them. One solution might be to count and report the number of errors reported on the output of the chain / solo.

In a similar way, there are errors that do not cause an abnormal termination, but that cause a hang. Since the runner has no way to know if the task is simply still pending, or actually hung, we could report some metrics about pending tasks at the end of the cycle, e.g. how many, how long they've been pending, or even when is the last time a task was started before shutdown (and if that's much higher than max time it took a task to complete, fail?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants