Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LS240 Locking restructure #589

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

LS240 Locking restructure #589

wants to merge 32 commits into from

Conversation

jlashner
Copy link
Collaborator

@jlashner jlashner commented Dec 7, 2023

This PR provides an example of how agents can be re-written to avoid locking mechanisms, using the single main process discussed in daq-dev

Description

Refactors LS240 agent to use lock-free agent structure as discussed in daq-dev. This is not really ready for PR, but I think it is far enough for people to look at / we can discuss structural changes or what we may want to move over to ocs proper.

A few notes about how it works:

  • Actions are defined as dataclasses with a process function that takes in some hardware driver object and does things with it.
  • tasks that create new actions will yield until the action is processed, at which point, anything returned from process will be set to session.data of said task. If the process function raises an exception, the defered errback is called, in which case the yield statement for that action will throw an exception, so the task will crash, while the main process will catch the exception and keep going.
  • I opted to define a process function per action, instead of a single process function that takes in an action type, because I think that will be easier if we move boilerplate into the ocs repo, but I'm not super attached
  • Error handling / robustness needs to be checked with real hardware, I'm not entirely sure I have the correct exceptions for when the module loses connection since I've only tested with a simulator
  • Still need to add docs
  • "degraded" process status / feed info not included

Motivation and Context

Removes multithreading / locking requirements from hardware handling in ls240

How Has This Been Tested?

Tested using ls240 simulator

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jlashner
Copy link
Collaborator Author

It's been a while since I started this, but I'd like to open this as a real PR since I think it would really improve the agent, and "robustifies" the agent as is described here: #538

In addition, I added a bunch of types and statically checked this module with mypy. I really liked this and it was useful for catching a number of cases in which optional types were not handled correctly, or the correct values weren't being returned from OCS operations. I think we should consider adding types and running OCS core through mypy, and encourage new agents/developers to use it.

@jlashner jlashner marked this pull request as ready for review August 14, 2024 01:08
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this restructure! And sorry this took so long to review. I'm excited about some of the things this improves, like making error handling more straightforward, since it isolates communication with the hardware within the main process.

A lot of what I wrote is my opinion on the overall structure from the perspective of if we were to make this the new recommended way that contributors write agents. That said, since we have several other agents converted to this structure already, and since this likely solves a few of the open LS240 agent issues, I'd be fine merging this with minimal changes. I just wanted to take the opportunity to discuss the structure since that's how this PR was initially framed.

Comment on lines 73 to 77
set_values_params = LS240Actions.SetValues(
channel=1, sensor=1, auto_range=1, range=0, current_reversal=0, units=1,
enabled=1, name="Channel 01"
)
resp = client.set_values(**asdict(set_values_params))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this representative of how we expect users to use these new actions, by importing them and passing them to clients to pass parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients don't need to work this way, and using regular keyword args still works, however I think it offers a couple of benefits... Mainly it ensures that client keyword args are what the agent expects, and it allows for static type checking on the client-side. If there is an unexpected parameter, a missing required parameter, or a type mismatch it can be caught here statically as opposed to requiring an integration test.

Comment on lines 19 to 21
on_rtd = os.environ.get("READTHEDOCS") == "True"
if not on_rtd:
from ocs import ocs_agent, site_config
from ocs.ocs_twisted import TimeoutLock
log = txaio.make_logger() # pylint: disable=E1101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be protected? We call txaio.make_logger() in other agents without this check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why this works in other agents, but without this guard we get this error on readthedocs:

https://readthedocs.org/projects/socs/builds/26043488/


return True, 'Lakeshore module initialized.'
# Register Operaionts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Operaionts" -> "Operations"

channel = module.channels[self.channel - 1]
channel.load_curve(self.filename)
time.sleep(0.1)
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return None is redundant. Does mypy need this or something?

But also this makes me wonder about the ActionReturnType. I know it's an optional type, but we're just not returning anything here, why type it?

socs/agents/lakeshore240/agent.py Outdated Show resolved Hide resolved
Comment on lines 29 to 30
class Actions:
"Namespace to hold action classes for the Lakeshore240 agent."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably move the actions to a separate actions.py or tasks.py module as a namespace, rather than using a class as such. This example is particularly succinct given this agent is very short, but other agents will have many more actions and this'll become quite large, especially if docstrings move here (see one of my other comments). If communication with the devices gets entirely contained in actions (again, see other comments), even more reason to separate them.

socs/agents/lakeshore240/agent.py Show resolved Hide resolved
Comment on lines +180 to +182
def _get_and_pub_temp_data(
self, module: Module, session: ocs_agent.OpSession
) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very nearly looks like an Action, with the small exception that it publishes data to a Feed. I'm wondering if it'd be nice to pull out the Action bit and do the publishing elsewhere, like in the main process. This probably requires some modification to the Action structure, or how things are passed around at least. What do you think?

We've moved so much of the the actual interaction with the device out of the agent class, this seems like the last bit that's left over, and now feels a bit out of place.

"temperatures", record=True, agg_params=agg_params, buffer_time=1
)

def set_values(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have @ocs_agent.param decorators here (or on upload_cal_curve and main.) I think they'd still be good to include since they protect against incorrect types sent from clients.

Comment on lines 36 to 39
self.processed: bool = False
self.success: bool = False
self.traceback: Optional[str] = None
self.result: Any = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this structure ends up sticking around these should be private, else agent authors need to be sure not to collide any agent arguments with these (or any future) attributes within the BaseAction -- I imagine once this is more solidified BaseAction and similar machinery would move to ocs-core.

@jlashner
Copy link
Collaborator Author

@BrianJKoopman I just added a function to automatically register tasks (with proper ocs_params) based on the Action class. I tested with integration tests, and it works as expected (fails automatically for parameter mismatch, etc.). Do you think you can take another look and see if this seems reasonable to you and solves your concerns about duplication?

@BrianJKoopman BrianJKoopman self-requested a review September 19, 2024 18:23
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrianJKoopman I just added a function to automatically register tasks (with proper ocs_params) based on the Action class. I tested with integration tests, and it works as expected (fails automatically for parameter mismatch, etc.). Do you think you can take another look and see if this seems reasonable to you and solves your concerns about duplication?

Great, register_task_from_action() looks good! Reducing that boilerplate is definitely moving in the right direction. Comments below. If you could give some thoughts on the remaining comments from the initial review too that'd be great.

Comment on lines +20 to +22
on_rtd = os.environ.get("READTHEDOCS") == "True"
if not on_rtd:
log = txaio.make_logger() # pylint: disable=E1101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to wonder about this protection, as we call this unprotected in other agents.

socs/util.py Outdated Show resolved Hide resolved
Comment on lines 32 to 43
"""upload_cal_curve(channel, filename)

**Task** - Upload a calibration curve to a channel.

Args
------
channel (int):
Channel number, 1-8.
filename (str):
Filename for calibration curve.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication and separation with having the functionality here and the docstring in a class method within the agent is solved, but I still find the "document this class as if it was a function" strange.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok if we adopted different docstring standards for this type of class? I.e. just document the class regularly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The user of the documentation is most commonly someone who wants to interact on the client side, which is why I think it should present like a function. Those users don't need to know about the process() method at all (in this example).

Comment on lines -273 to +323
parser.add_argument('--fake-data', help=argparse.SUPPRESS)
parser.add_argument('--num-channels', help=argparse.SUPPRESS)
parser.add_argument("--fake-data", help=argparse.SUPPRESS)
parser.add_argument("--num-channels", help=argparse.SUPPRESS)

# Interpret options in the context of site_config.
args = site_config.parse_args(agent_class='Lakeshore240Agent',
parser=parser,
args=args)
args = site_config.parse_args(
agent_class="Lakeshore240Agent", parser=parser, args=args
)

if args.fake_data is not None:
warnings.warn("WARNING: the --fake-data parameter is deprecated, please "
"remove from your site-config file", DeprecationWarning)
warnings.warn(
"WARNING: the --fake-data parameter is deprecated, please "
"remove from your site-config file",
DeprecationWarning,
)

if args.num_channels is not None:
warnings.warn("WARNING: the --num-channels parameter is deprecated, please "
"remove from your site-config file", DeprecationWarning)

# Automatically acquire data if requested (default)
init_params = False
if args.mode == 'init':
init_params = {'auto_acquire': False}
elif args.mode == 'acq':
init_params = {'auto_acquire': True}
warnings.warn(
"WARNING: the --num-channels parameter is deprecated, please "
"remove from your site-config file",
DeprecationWarning,
)

if args.mode is not None:
warnings.warn(
"WARNING: the --init-mode parameter is deprecated, please "
"remove from your site-config file",
DeprecationWarning,
)

device_port = None
if args.port is not None:
device_port = args.port
else: # Tries to find correct USB port automatically

# This exists if udev rules are setup properly for the 240s
if os.path.exists('/dev/{}'.format(args.serial_number)):
if os.path.exists("/dev/{}".format(args.serial_number)):
device_port = "/dev/{}".format(args.serial_number)

elif os.path.exists('/dev/serial/by-id'):
ports = os.listdir('/dev/serial/by-id')
elif os.path.exists("/dev/serial/by-id"):
ports = os.listdir("/dev/serial/by-id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this reformatting, the replacement of ' with " and the changes in line breaks make me think there was some sort of auto-formatting run. Should we consider additional tooling in our pre-commit config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... sorry I did this in the PR, but I did run this through Ruff auto-formatting. Its very nice and I think would be a good addition to the pre-commit. Since this is a big rewrite I thought it would be ok to run on this file, but in general it can distract from actual changes, so I think we'd want to run on the whole repo before implementing the pre-commit.

Comment on lines -80 to -111
"""set_values(channel, sensor=None, auto_range=None, range=None,\
current_reversal=None, units=None, enabled=None, name=None)

**Task** - Set sensor parameters for a Lakeshore240 Channel.

Args:
channel (int):
Channel number to set. Valid choices are 1-8.
sensor (int, optional):
Specifies sensor type. See
:func:`socs.Lakeshore.Lakeshore240.Channel.set_values` for
possible types.
auto_range (int, optional):
Specifies if channel should use autorange. Must be 0 or 1.
range (int, optional):
Specifies range if auto_range is false. Only settable for NTC
RTD. See
:func:`socs.Lakeshore.Lakeshore240.Channel.set_values` for
possible ranges.
current_reversal (int, optional):
Specifies if input current reversal is on or off.
Always 0 if input is a diode.
units (int, optional):
Specifies preferred units parameter, and sets the units for
alarm settings. See
:func:`socs.Lakeshore.Lakeshore240.Channel.set_values` for
possible units.
enabled (int, optional):
Sets if channel is enabled.
name (str, optional):
Sets name of channel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since task docstrings moved the docs page needs some updates to the API section.

socs/agents/lakeshore240/agent.py Show resolved Hide resolved
socs/util.py Outdated Show resolved Hide resolved
socs/util.py Outdated


def get_md5sum(filename):
m = hashlib.md5()

for line in open(filename, 'rb'):
for line in open(filename, "rb"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated reformatting.

socs/util.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of util.py, it'll just accrues a bunch of miscellaneous stuff. Can we put the new actions stuff in a module called action.py? I imagine this'll move upstream to ocs eventually. (get_md5sum only gets imported by one module, we should frankly move it and get rid of util.py, but not in this PR.)

socs/util.py Outdated
Comment on lines 69 to 113
def is_instanceable(t: Type) -> bool:
"""
Checks if its possible to run isinstance with a specified type. This is
needed because older version of python don't let you run this on subscripted
generics.
"""
try:
isinstance(0, t)
return True
except Exception:
return False


def get_param_type(t: Type) -> Optional[Type]:
"""
OCS param type variables require you to be able to run isinstance,
which does not work for subscripted generics like Optional in python3.8.
This function attempts to convert types to values that will be accepted
by ocs_agent.param, unwrapping optional types if we are unable to run
isinstance off the bat.

Other subscripted generics such as List[...] or Dict[...] are not currently
supported.

This function will return the unwrapped type, or None if it fails.
"""
origin_type = get_origin(t)

# Unwrap possible option type
if origin_type == Union:
sub_types = get_args(t)
if len(sub_types) != 2:
return None
if type(None) not in sub_types:
return None
for st in sub_types:
if st is not type(None):
if is_instanceable(st):
return st

elif is_instanceable(t):
# If this works, then it should work with ocs_agent.param
return t

return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark these two functions as private with a leading _. They can always be made public later if needed.

@jlashner
Copy link
Collaborator Author

Hi Brian,

I went back through and addressed your comments, so if you can give it another look that would be great!

After thinking about the documentation issue, I think that the awkwardness stems from the fact that we are using docs of agent class members and supporting elements as the documentation of the client interface, even when the two don't match perfectly. I think this documentation method enforces pretty strict structural requirements for the agent class, which isn't always necessary or beneficial.

While using the agent method docstrings as the official "agent API" is nice for a lot of agents, I think we should also support agents that are structured differently, and figure out another way to document the API for these cases, even if its just writing it up manually in the docs page.

For now, replaced the LS240Agent autoclass in the docs page with the docs for the Action classes, and in the action class docstrings I put examples for how to run the action through a client. Hopefully this strikes a balance.

Thanks!

@BrianJKoopman BrianJKoopman self-requested a review October 31, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants