Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds grip and ungrip fns to hwp-supervisor with safety checks #708

Merged
merged 15 commits into from
Sep 9, 2024

Conversation

jlashner
Copy link
Collaborator

@jlashner jlashner commented Aug 2, 2024

Adds grip and ungrip fns to hwp-supervisor with safety checks.

This PR adds two additional operations to the HWP supervisor to grip/ungrip the HWP

Description

For these two actions, the HWP and ACU states are first checked based on kyohei's daq-discussion.
If checks pass, the gripper agent grip/ungrip operations will be called.

Motivation and Context

This is to centralize the gripping procedure and to implement safety checks.

How Has This Been Tested?

This has not been tested yet. @ykyohei any chance we can test at site sometime soon?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@BrianJKoopman
Copy link
Member

Will take a look soon. In the meantime, merging in the latest main should fix the build error. See #709.

Copy link
Contributor

@ykyohei ykyohei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this quickly, we will test this.

  • We would like to have --acu-min/max-roll (or boresight, bs?) for gripper operation.
  • We would like to monitor acu.roll_current/commanded_position for gripper operation.
  • Can current hwp-supervisor and ocs provide information to acu that the gripper functions are running, so that acu can restrict its operation?

@jlashner
Copy link
Collaborator Author

jlashner commented Aug 3, 2024

Hi Kyohei, I added the boresight checks and exposed the supervisor control state in several places so it can be checked by the ACU. I think the following is the one with the least amount of latency (~1 sec)...

# If this is GripHWP or UngripHWP, don't move the ACU
control_state_type = supervisor.spin_control.status().session['data']['current_state_type']

However I don't think this completely resolves the problem, since there is still a race condition if ACU movement and Grip/Ungrip are called at the same time (or within ~1 sec of each other)... there are ways that I can reduce the latency at which the control state session info is updated, but if this is something we really care about, we may want to thing about using a filesystem mutex to prevent simultaneous gripper/acu control (since I believe both agents are running on the same node). @mhasself do you think this may be worth doing or do you think that's overkill?

@mhasself
Copy link
Member

mhasself commented Aug 5, 2024

Hi Kyohei, I added the boresight checks and exposed the supervisor control state in several places so it can be checked by the ACU. I think the following is the one with the least amount of latency (~1 sec)...

# If this is GripHWP or UngripHWP, don't move the ACU
control_state_type = supervisor.spin_control.status().session['data']['current_state_type']

However I don't think this completely resolves the problem, since there is still a race condition if ACU movement and Grip/Ungrip are called at the same time (or within ~1 sec of each other)... there are ways that I can reduce the latency at which the control state session info is updated, but if this is something we really care about, we may want to thing about using a filesystem mutex to prevent simultaneous gripper/acu control (since I believe both agents are running on the same node). @mhasself do you think this may be worth doing or do you think that's overkill?

I would prefer to use an OCS-based handshake rather than some other system (e.g. filesystem). That would mean:

  • HWPSupervisor sets some flag "block_ACU_motion" flag, and waits...
  • When ACU "monitor" process sees the flag, it quickly assesses whether it is stationary (i.e. move_to / scan aren't running), and if so it (a) locks out further motion and (b) sets an acknowledgment bit, e.g. "motion_blocked_by_hwp_supervisor: true". If it can't block motion right now, it doesn't. But as long as the "block_motion" flag is true, it will keep assessing + trying to block out motion.
  • The HWPSupervisor waits for "motion_blocked_by_hwp_supervisor" flag to be true. That might take 2 seconds. It might never happen. HWPSupervisor decides whether to give up, or keep waiting.
  • When the HWPSupervisor is finished its activity, it lowers the "block_ACU_motion" flag.
  • When ACU sees block_ACU_motion flag lowered, it resumes normal operation.

That seem good?

The ACU agent modifications are not too difficult, but we will need a testing platform for a few hours.

@jlashner
Copy link
Collaborator Author

jlashner commented Aug 8, 2024

I think that seems good, but should probably go into a separate PR so that we can have some basic form of protection in the meantime. @ykyohei do you think we can test this version of the branch at site?

@mhasself
Copy link
Member

mhasself commented Aug 8, 2024

I think that seems good, but should probably go into a separate PR so that we can have some basic form of protection in the meantime. @ykyohei do you think we can test this version of the branch at site?

Fine with me but you could put in that "block_ACU_motion" flag right away, if it's not already in here somewhere. (But not insist on hand-shake completion.) Then ACU Agent can evolve + start implementing the block-out; then you can come back and implement the handshake fully to eliminate the race condition.

@jlashner
Copy link
Collaborator Author

Hi Matthew, in the latest commit I added the request_block_ACU_motion flag to the HWP state, which should be accessible from session data of the monitor process, which I think should be enough for you to develop with. In response we'll need an operation that we can query session data for a ACU_motion_blocked field. The ensure_grip_safety context manager should implement the HWP-side of the handshake, though it won't be used unless a site-config var is set and I add an update for the ACU_motion_blocked state var.

Hope this is enough for now!

@BrianJKoopman BrianJKoopman requested a review from mhasself August 13, 2024 00:21
@ykyohei ykyohei self-requested a review August 15, 2024 18:20
Copy link
Contributor

@ykyohei ykyohei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made minor debugs, and I believe the safety checks are working but I have some requests.

socs/agents/hwp_supervisor/agent.py Show resolved Hide resolved
socs/agents/hwp_supervisor/agent.py Show resolved Hide resolved
socs/agents/hwp_supervisor/agent.py Outdated Show resolved Hide resolved
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly OK to me, just a question about behavior when acu is None.

socs/agents/hwp_supervisor/agent.py Show resolved Hide resolved
@jlashner
Copy link
Collaborator Author

jlashner commented Sep 6, 2024

Hi @ykyohei and @BrianJKoopman, I think I addressed the remaining issues in the latest commits, with the exception of the force_grip option, which I want to double check this is something we actually need and want before implementing.

Now, the ACU instance ID will always default to "acu", and checks will only be skipped if the --no-acu argument is specified. This is to make it so that the default behavior for the SATs will be to run with safety checks without having to change anything in their config.

If we could test this on an instrument that would be great!

@ykyohei
Copy link
Contributor

ykyohei commented Sep 6, 2024

Thank you so much Jack, I tested this and safety checks look working.
For the acu blocking. if I add --use-acu-blocking, then I get RuntimeError: ACU motion was not blocked within timeout.
This is an expected behavior because hwp-acu handshake is not completed yet, right?
We should not add --use-acu-blocking until hwp-acu handshake is completed. Is this correct?

@jlashner
Copy link
Collaborator Author

jlashner commented Sep 6, 2024

Yes that's right, the ACU blocking will not work until its implemented on the ACU side. Thanks for testing!

Though testing worked successfully, I am realizing that there is not currently a good way to test this without risk of it not working and potentially putting the telescope in danger. I think for this reason I should add a "--dry" mode where gripper and hwp operations are not actually conducted, we just log statements. I think lets merge this PR and then I will implement this separately.

@BrianJKoopman BrianJKoopman merged commit f1c537a into main Sep 9, 2024
7 checks passed
@BrianJKoopman BrianJKoopman deleted the hwp_supervisor_grip branch September 9, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants