Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Alerting when services go down #110

Open
AquiGorka opened this issue May 22, 2024 · 1 comment
Open

[core] Alerting when services go down #110

AquiGorka opened this issue May 22, 2024 · 1 comment
Assignees

Comments

@AquiGorka
Copy link
Contributor

Figure out what would be the best way for us to learn when the solver, job creator and/or chain services stop working for whatever reason.

@bgins
Copy link
Contributor

bgins commented Jul 1, 2024

We have started this effort with:

  • EC2 status checks
  • Cloudflare tunnel alerts when tunnel goes down

Next steps:

  • Implement Cloudflare tunnel alerts in OpenTofu
  • Add a periodic cowsay job to check network liveness
  • Alerts from our observability stack once live

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants