Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout crashes HWP PID main process #786

Open
BrianJKoopman opened this issue Oct 28, 2024 · 0 comments
Open

Timeout crashes HWP PID main process #786

BrianJKoopman opened this issue Oct 28, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@BrianJKoopman
Copy link
Member

Saw this this morning on satp1:

2024-10-28 11:50:39.622 2024-10-28T11:50:39+0000 DecodedResponse(msg_type='measure', msg='Current frequency = 1.999', measure=1.999)
2024-10-28 11:50:39.622 2024-10-28T11:50:39+0000 Finding target CHWP Frequency
2024-10-28 11:50:40.123 2024-10-28T11:50:40+0000 ['R014007D0']
2024-10-28 11:50:40.123 2024-10-28T11:50:40+0000 DecodedResponse(msg_type='read', msg='Setpoint = 2.0', measure=2.0)
2024-10-28 11:50:40.123 2024-10-28T11:50:40+0000 Finding CHWP Direction
2024-10-28 11:50:42.626 2024-10-28T11:50:42+0000 Caught timeout waiting for response from PID controller. Trying again...
2024-10-28 11:50:44.290 2024-10-28T11:50:44+0000 ['R02400000']
2024-10-28 11:50:44.290 2024-10-28T11:50:44+0000 DecodedResponse(msg_type='read', msg='Direction = Forward', measure=0)
2024-10-28 11:50:44.491 2024-10-28T11:50:44+0000 Finding CHWP Frequency
2024-10-28 11:50:44.991 2024-10-28T11:50:44+0000 ['\x00\x00\x00\x00\x00\x00\x00\x00\x00X012.001']
2024-10-28 11:50:44.991 2024-10-28T11:50:44+0000 DecodedResponse(msg_type='error', msg='Unrecognized response', measure=None)
2024-10-28 11:50:44.991 2024-10-28T11:50:44+0000 Error reading freq: Unrecognized response
2024-10-28 11:50:44.991 2024-10-28T11:50:44+0000 ['\x00\x00\x00\x00\x00\x00\x00\x00\x00X012.001']
2024-10-28 11:50:44.992 2024-10-28T11:50:44+0000 DecodedResponse(msg_type='error', msg='Unrecognized response', measure=None)
2024-10-28 11:50:44.992 2024-10-28T11:50:44+0000 Error reading freq: Unrecognized response
2024-10-28 11:50:44.992 2024-10-28T11:50:44+0000 ['\x00\x00\x00\x00\x00\x00\x00\x00\x00X012.001']
2024-10-28 11:50:44.992 2024-10-28T11:50:44+0000 DecodedResponse(msg_type='error', msg='Unrecognized response', measure=None)
2024-10-28 11:50:44.992 2024-10-28T11:50:44+0000 Error reading freq: Unrecognized response
2024-10-28 11:50:44.993 2024-10-28T11:50:44+0000 main:104 CRASH: [Failure instance: Traceback: <class 'ValueError'>: Could not get current frequency
2024-10-28 11:50:44.993 /usr/lib/python3.10/threading.py:1016:_bootstrap_inner
2024-10-28 11:50:44.993 /usr/lib/python3.10/threading.py:953:run
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/twisted/_threads/_threadworker.py:49:work
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/twisted/_threads/_team.py:192:doWork
2024-10-28 11:50:44.993 --- <exception caught here> ---
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/twisted/python/threadpool.py:269:inContext
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/twisted/python/threadpool.py:285:<lambda>
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/twisted/python/context.py:117:callWithContext
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/twisted/python/context.py:82:callWithContext
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/ocs/ocs_agent.py:984:_running_wrapper
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/socs/agents/hwp_pid/agent.py:174:main
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/socs/agents/hwp_pid/agent.py:121:_get_data_and_publish
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/socs/agents/hwp_pid/agent.py:33:get_pid_state
2024-10-28 11:50:44.993 /opt/venv/lib/python3.10/site-packages/socs/agents/hwp_pid/drivers/pid_controller.py:248:get_freq
2024-10-28 11:50:44.993 ]
2024-10-28 11:50:44.993 2024-10-28T11:50:44+0000 main:104 Status is now "done".
2024-10-28 13:28:49.827 2024-10-28T13:28:49+0000 start called for main
2024-10-28 13:28:49.827 2024-10-28T13:28:49+0000 main:105 Status is now "starting".
2024-10-28 13:28:49.829 2024-10-28T13:28:49+0000 main:105 Status is now "running".
2024-10-28 13:28:49.830 2024-10-28T13:28:49+0000 Connected to PID controller
2024-10-28 13:28:49.831 2024-10-28T13:28:49+0000 Finding CHWP Frequency
2024-10-28 13:28:50.332 2024-10-28T13:28:50+0000 ['X011.999']
2024-10-28 13:28:50.332 2024-10-28T13:28:50+0000 DecodedResponse(msg_type='measure', msg='Current frequency = 1.999', measure=1.999)
2024-10-28 13:28:50.332 2024-10-28T13:28:50+0000 Finding target CHWP Frequency
2024-10-28 13:28:50.833 2024-10-28T13:28:50+0000 ['R014007D0']
2024-10-28 13:28:50.833 2024-10-28T13:28:50+0000 DecodedResponse(msg_type='read', msg='Setpoint = 2.0', measure=2.0)

Recovery was just restarting the main process, which you can see in the logs. Resolving this will contribute to the effort in #721.

socs image tag: v0.5.1-22-g7d2f158-dev

@BrianJKoopman BrianJKoopman added the bug Something isn't working label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant