You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was helping satp2 try to recover their HWP system this morning and found the supervisor agent in this state:
2024-09-26T17:09:50+0000 startup-op: launching monitor
2024-09-26T17:09:50+0000 start called for monitor
2024-09-26T17:09:50+0000 monitor:0 Status is now "starting".
2024-09-26T17:09:50+0000 startup-op: launching spin_control
2024-09-26T17:09:50+0000 start called for spin_control
2024-09-26T17:09:50+0000 spin_control:1 Status is now "starting".
2024-09-26T17:09:50+0000 monitor:0 Status is now "running".
2024-09-26T17:09:50+0000 spin_control:1 Status is now "running".
2024-09-26T17:09:55+0000 Error getting status: [0, 0, 0, 0, 'wamp.error.no_such_procedure', ['no callee registered for procedure <satp2.power-ups-az.ops>'], {}]
2024-09-26T17:09:55+0000 Could not connect to client: power-ups-az
2024-09-26T17:09:55+0000 Error getting status: [0, 0, 0, 0, 'wamp.error.no_such_procedure', ['no callee registered for procedure <satp2.power-iboot-hwp-2.ops>'], {}]
2024-09-26T17:09:56+0000 Error getting status: [0, 0, 0, 0, 'wamp.error.no_such_procedure', ['no callee registered for procedure <satp2.power-iboot-hwp-2.ops>'], {}]
2024-09-26T17:09:56+0000 Error getting status: [0, 0, 0, 0, 'wamp.error.no_such_procedure', ['no callee registered for procedure <satp2.acu.ops>'], {}]
2024-09-26T17:09:56+0000 Could not connect to client: power-iboot-hwp-2
2024-09-26T17:09:56+0000 monitor:0 CRASH: [Failure instance: Traceback: <class 'ValueError'>: Could not find upsOutputSource OID
/usr/lib/python3.10/threading.py:1016:_bootstrap_inner
/usr/lib/python3.10/threading.py:953:run
/opt/venv/lib/python3.10/site-packages/twisted/_threads/_threadworker.py:49:work
/opt/venv/lib/python3.10/site-packages/twisted/_threads/_team.py:192:doWork
--- <exception caught here> ---
/opt/venv/lib/python3.10/site-packages/twisted/python/threadpool.py:269:inContext
/opt/venv/lib/python3.10/site-packages/twisted/python/threadpool.py:285:<lambda>
/opt/venv/lib/python3.10/site-packages/twisted/python/context.py:117:callWithContext
/opt/venv/lib/python3.10/site-packages/twisted/python/context.py:82:callWithContext
/opt/venv/lib/python3.10/site-packages/ocs/ocs_agent.py:984:_running_wrapper
/opt/venv/lib/python3.10/site-packages/socs/agents/hwp_supervisor/agent.py:1374:monitor
/opt/venv/lib/python3.10/site-packages/socs/agents/hwp_supervisor/agent.py:442:update_ups_state
]
2024-09-26T17:09:56+0000 monitor:0 Status is now "done".
It seems like it wasn't able to connect to any of the clients so when monitorgoes to grab state info it hits this raise, which it doesn't handle:
Thanks for this. The correct behavior is probably to catch this in the monitor_state process and mark it as degraded... and also raise a flag to make sure none of the spin-up commands can run.
I think it might make sense to move the safety check logic from the control-update function into properties of the HWPState object, such as spin_up_safe and grip_safe that check internal state variables like this and return a bool. (I don't think UPS state is currently checked anywhere before)
I was helping satp2 try to recover their HWP system this morning and found the supervisor agent in this state:
It seems like it wasn't able to connect to any of the clients so when
monitor
goes to grab state info it hits thisraise
, which it doesn't handle:socs/socs/agents/hwp_supervisor/agent.py
Line 442 in 33b1e1d
EDIT: This was on socs image:
v0.5.1-22-g7d2f158-dev
The text was updated successfully, but these errors were encountered: