Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on epics failure for cryocard commands #792

Open
jlashner opened this issue May 13, 2024 · 3 comments
Open

Retry on epics failure for cryocard commands #792

jlashner opened this issue May 13, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@jlashner
Copy link
Collaborator

Describe the problem

Commands to the cryocard sometimes return None, signaling an epics timeout occurred. Unlike epics cagets in the smurf_command module, the cyrocard do_read function has no retry_on_fail, so if there is an epics failure, due to either hardware connection or server load, it will just return None instead of trying again.

Describe the solution you'd like

It would be nice if we could have the option to retry_on_fail for cryocard commands.

@jlashner jlashner added the enhancement New feature or request label May 13, 2024
@tristpinsm tristpinsm self-assigned this May 15, 2024
@tristpinsm
Copy link
Collaborator

Hi Jack,
I've been looking at CryoCard.do_read and it looks like as it is now it will retry, up to 5 times by default, when trying to read from a given address. So I'm wondering if there is somewhere else where this issue may be coming from? Do you have an example of a command that times out?

Also, looking at that code I'm not sure how it behaves if an epics timeout does occur...

#need double write to make sure buffer is updated
self.writepv.put(cmd_make(1, address, 0))
for self.retry in range(0, self.max_retries):
    self.writepv.put(cmd_make(1, address, 0))
    data = self.readpv.get(use_monitor=use_monitor)
    addrrb = cmd_address(data)
    if (addrrb == address):
        return (data)
return (0)

return (self.readpv.get(use_monitor=use_monitor))

My understanding is that a timeout would result in PV.get returning None, which should then raise an exception when the cmd_address tries to interpret it as an int. (also noting the unreachable return statement at the end)

@jlashner
Copy link
Collaborator Author

Ya I think that's what I determined as well looking at this closer... the retry is failing because cmd_address cannot handle None inputs.

@jlashner
Copy link
Collaborator Author

One such failure is documented here: https://github.com/simonsobs/daq-discussions/discussions/91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants