Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use overarching Error enum in agents #2878

Open
1 task
daniel-savu opened this issue Oct 31, 2023 · 2 comments
Open
1 task

Use overarching Error enum in agents #2878

daniel-savu opened this issue Oct 31, 2023 · 2 comments
Assignees

Comments

@daniel-savu
Copy link
Contributor

daniel-savu commented Oct 31, 2023

I'd like to have one error enum that includes ChainCommunicationError, HyperlaneCustomErrorWrapper, eyre errors, and so on. Wondering what the reasoning behind not doing this before was? cc @tkporter

Some work on this has already started in #2881

Useful subtasks:

  • write a small macro to implement From<Err> for ChainCommunicationError for all the sub-errors of HyperlaneCosmosError et al
@nambrot
Copy link
Contributor

nambrot commented Feb 29, 2024

@daniel-savu still relevant I assume?

@daniel-savu
Copy link
Contributor Author

It is yes. We've already started consolidating environment-specific errors into ChainCommunicationError using the From trait impl approach (example here for cosmos), but it's a work in progress and many errors are still standalone.

daniel-savu added a commit that referenced this issue Mar 6, 2024
### Description

- Makes validator tasks infallible by adding retries 
- This fixes a bug where certain tasks would return an error an shut
down, affecting liveness. This was desired in the relayer since we don't
want individual chain failures to affect the liveness of other chains.
Now validator tasks either terminate or panic, and panicking will be
propagated by `try_join_all`, causing the agent to shut down.
- A thing to keep in mind in general is that agents will only terminate
if a task panics. If it returns an `Err` but doesn't panic, the task
won't be respawned. We should consider the implications of this in the
scraper too. If this isn't desired, should consider using `select_all!`

### Drive-by changes

- retry logic is moved from
`rust/chains/hyperlane-cosmos/src/providers/rpc.rs` into
`rust/hyperlane-core/src/rpc_clients/retry.rs`, so we're even closer to
turning it into a retrying provider
- changes several fn signatures within the validator to now return
`ChainCommunicationError`s instead of `eyre::Report`s, for compatibility
with the retry logic. Also makes `DbError` convertible to
`ChainCommunicationError`, for the same reason. This achieves some
progress on
#2878
- allows the backfill checkpoint submitter task to terminate, since
`try_join_all` is tolerant of this (as described above)

### Related issues

- Fixes #3349


### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants