-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provider 0.5.12
: 8443/status & 8444 grpc endpoints sporadically falling off
#214
Comments
FYI: Current workaroundCurrent livenessProbe checks helping in provider restart whenever this issue gets detected, so the manifest still can be sent to the provider whenever users want to deploy something. Liveness checks involved: Running the liveness check manually confirms the issue:
|
affected providersAdditionally attached the complete logs for the provider pods taken with SW versions:
h100.mon.obl.akash.pub.provider.previous.log
a100.mon.obl.akash.pub.provider.previous.log
provider-02.sandbox-01.aksh.pw.provider.previous.log
pdx.nb.akash.pub.provider.previous.log
ty.lneq.akash.pub.provider.previous.log
sg.lneq.akash.pub.provider.previous.log
au.lneq.akash.pub.provider.previous.log
akash.pro.provider.previous.log
hk.lneq.akash.pub.provider.previous.log no issues on these providers, yet 🤞
|
Providers 0.5.12 can downgrade to 0.5.11 using these commands:
|
I've noticed this primarily happening on
*.mon.obl.akash.pub
providers; most frequently on theh100.mon.obl.akash.pub provider
Additional observations while requests to
:8443/status
& 8444 grpc endpoints hang:Logs
h100.mon.obl.akash.pub.provider.log
h100.mon.obl.akash.pub.deployment-operator-inventory.log
The text was updated successfully, but these errors were encountered: