LS240 Locking restructure #589

jlashner · 2023-12-07T00:20:16Z

This PR provides an example of how agents can be re-written to avoid locking mechanisms, using the single main process discussed in daq-dev

Description

Refactors LS240 agent to use lock-free agent structure as discussed in daq-dev. This is not really ready for PR, but I think it is far enough for people to look at / we can discuss structural changes or what we may want to move over to ocs proper.

A few notes about how it works:

Actions are defined as dataclasses with a process function that takes in some hardware driver object and does things with it.
tasks that create new actions will yield until the action is processed, at which point, anything returned from process will be set to session.data of said task. If the process function raises an exception, the defered errback is called, in which case the yield statement for that action will throw an exception, so the task will crash, while the main process will catch the exception and keep going.
I opted to define a process function per action, instead of a single process function that takes in an action type, because I think that will be easier if we move boilerplate into the ocs repo, but I'm not super attached
Error handling / robustness needs to be checked with real hardware, I'm not entirely sure I have the correct exceptions for when the module loses connection since I've only tested with a simulator
Still need to add docs
"degraded" process status / feed info not included

Motivation and Context

Removes multithreading / locking requirements from hardware handling in ls240

How Has This Been Tested?

Tested using ls240 simulator

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

for more information, see https://pre-commit.ci

…240-restructure

for more information, see https://pre-commit.ci

jlashner · 2024-08-14T01:08:16Z

It's been a while since I started this, but I'd like to open this as a real PR since I think it would really improve the agent, and "robustifies" the agent as is described here: #538

In addition, I added a bunch of types and statically checked this module with mypy. I really liked this and it was useful for catching a number of cases in which optional types were not handled correctly, or the correct values weren't being returned from OCS operations. I think we should consider adding types and running OCS core through mypy, and encourage new agents/developers to use it.

for more information, see https://pre-commit.ci

…240-restructure

for more information, see https://pre-commit.ci

BrianJKoopman

Thanks for working on this restructure! And sorry this took so long to review. I'm excited about some of the things this improves, like making error handling more straightforward, since it isolates communication with the hardware within the main process.

A lot of what I wrote is my opinion on the overall structure from the perspective of if we were to make this the new recommended way that contributors write agents. That said, since we have several other agents converted to this structure already, and since this likely solves a few of the open LS240 agent issues, I'd be fine merging this with minimal changes. I just wanted to take the opportunity to discuss the structure since that's how this PR was initially framed.

BrianJKoopman · 2024-09-04T14:10:23Z

tests/integration/test_ls240_agent_integration.py

+    set_values_params = LS240Actions.SetValues(
+        channel=1, sensor=1, auto_range=1, range=0, current_reversal=0, units=1,
+        enabled=1, name="Channel 01"
+    )
+    resp = client.set_values(**asdict(set_values_params))


Is this representative of how we expect users to use these new actions, by importing them and passing them to clients to pass parameters?

Clients don't need to work this way, and using regular keyword args still works, however I think it offers a couple of benefits... Mainly it ensures that client keyword args are what the agent expects, and it allows for static type checking on the client-side. If there is an unexpected parameter, a missing required parameter, or a type mismatch it can be caught here statically as opposed to requiring an integration test.

BrianJKoopman · 2024-09-04T14:16:39Z

socs/agents/lakeshore240/agent.py

+on_rtd = os.environ.get("READTHEDOCS") == "True"
 if not on_rtd:
-    from ocs import ocs_agent, site_config
-    from ocs.ocs_twisted import TimeoutLock
+    log = txaio.make_logger()  # pylint: disable=E1101


Does this need to be protected? We call txaio.make_logger() in other agents without this check.

I'm not entirely sure why this works in other agents, but without this guard we get this error on readthedocs:

https://readthedocs.org/projects/socs/builds/26043488/

BrianJKoopman · 2024-09-04T14:22:30Z

socs/agents/lakeshore240/agent.py


-        return True, 'Lakeshore module initialized.'
+        # Register Operaionts


"Operaionts" -> "Operations"

BrianJKoopman · 2024-09-04T19:03:16Z

socs/agents/lakeshore240/agent.py

+            channel = module.channels[self.channel - 1]
+            channel.load_curve(self.filename)
+            time.sleep(0.1)
+            return None


return None is redundant. Does mypy need this or something?

But also this makes me wonder about the ActionReturnType. I know it's an optional type, but we're just not returning anything here, why type it?

socs/agents/lakeshore240/agent.py

BrianJKoopman · 2024-09-06T20:34:56Z

socs/agents/lakeshore240/agent.py

+class Actions:
+    "Namespace to hold action classes for the Lakeshore240 agent."


I'd probably move the actions to a separate actions.py or tasks.py module as a namespace, rather than using a class as such. This example is particularly succinct given this agent is very short, but other agents will have many more actions and this'll become quite large, especially if docstrings move here (see one of my other comments). If communication with the devices gets entirely contained in actions (again, see other comments), even more reason to separate them.

socs/agents/lakeshore240/agent.py

BrianJKoopman · 2024-09-06T21:00:29Z

socs/agents/lakeshore240/agent.py

+    def _get_and_pub_temp_data(
+        self, module: Module, session: ocs_agent.OpSession
+    ) -> Dict[str, Any]:


This very nearly looks like an Action, with the small exception that it publishes data to a Feed. I'm wondering if it'd be nice to pull out the Action bit and do the publishing elsewhere, like in the main process. This probably requires some modification to the Action structure, or how things are passed around at least. What do you think?

We've moved so much of the the actual interaction with the device out of the agent class, this seems like the last bit that's left over, and now feels a bit out of place.

BrianJKoopman · 2024-09-10T21:25:12Z

socs/agents/lakeshore240/agent.py

+            "temperatures", record=True, agg_params=agg_params, buffer_time=1
+        )
+
+    def set_values(


We don't have @ocs_agent.param decorators here (or on upload_cal_curve and main.) I think they'd still be good to include since they protect against incorrect types sent from clients.

BrianJKoopman · 2024-09-11T16:22:08Z

socs/agents/lakeshore240/agent.py

+            self.processed: bool = False
+            self.success: bool = False
+            self.traceback: Optional[str] = None
+            self.result: Any = None


If this structure ends up sticking around these should be private, else agent authors need to be sure not to collide any agent arguments with these (or any future) attributes within the BaseAction -- I imagine once this is more solidified BaseAction and similar machinery would move to ocs-core.

for more information, see https://pre-commit.ci

jlashner · 2024-09-11T19:32:41Z

@BrianJKoopman I just added a function to automatically register tasks (with proper ocs_params) based on the Action class. I tested with integration tests, and it works as expected (fails automatically for parameter mismatch, etc.). Do you think you can take another look and see if this seems reasonable to you and solves your concerns about duplication?

…240-restructure

for more information, see https://pre-commit.ci

BrianJKoopman

@BrianJKoopman I just added a function to automatically register tasks (with proper ocs_params) based on the Action class. I tested with integration tests, and it works as expected (fails automatically for parameter mismatch, etc.). Do you think you can take another look and see if this seems reasonable to you and solves your concerns about duplication?

Great, register_task_from_action() looks good! Reducing that boilerplate is definitely moving in the right direction. Comments below. If you could give some thoughts on the remaining comments from the initial review too that'd be great.

BrianJKoopman · 2024-09-18T20:46:22Z

socs/agents/lakeshore240/agent.py

+on_rtd = os.environ.get("READTHEDOCS") == "True"
+if not on_rtd:
+    log = txaio.make_logger()  # pylint: disable=E1101


I continue to wonder about this protection, as we call this unprotected in other agents.

socs/util.py

BrianJKoopman · 2024-09-18T21:09:49Z

socs/agents/lakeshore240/agent.py

+    """upload_cal_curve(channel, filename)
+
+    **Task** - Upload a calibration curve to a channel.
+
+    Args
+    ------
+    channel (int):
+        Channel number, 1-8.
+    filename (str):
+        Filename for calibration curve.
+    """
+


The duplication and separation with having the functionality here and the docstring in a class method within the agent is solved, but I still find the "document this class as if it was a function" strange.

Would it be ok if we adopted different docstring standards for this type of class? I.e. just document the class regularly.

I don't think so. The user of the documentation is most commonly someone who wants to interact on the client side, which is why I think it should present like a function. Those users don't need to know about the process() method at all (in this example).

BrianJKoopman · 2024-09-19T16:06:50Z

socs/agents/lakeshore240/agent.py

-    parser.add_argument('--fake-data', help=argparse.SUPPRESS)
-    parser.add_argument('--num-channels', help=argparse.SUPPRESS)
+    parser.add_argument("--fake-data", help=argparse.SUPPRESS)
+    parser.add_argument("--num-channels", help=argparse.SUPPRESS)

    # Interpret options in the context of site_config.
-    args = site_config.parse_args(agent_class='Lakeshore240Agent',
-                                  parser=parser,
-                                  args=args)
+    args = site_config.parse_args(
+        agent_class="Lakeshore240Agent", parser=parser, args=args
+    )

    if args.fake_data is not None:
-        warnings.warn("WARNING: the --fake-data parameter is deprecated, please "
-                      "remove from your site-config file", DeprecationWarning)
+        warnings.warn(
+            "WARNING: the --fake-data parameter is deprecated, please "
+            "remove from your site-config file",
+            DeprecationWarning,
+        )

    if args.num_channels is not None:
-        warnings.warn("WARNING: the --num-channels parameter is deprecated, please "
-                      "remove from your site-config file", DeprecationWarning)
-
-    # Automatically acquire data if requested (default)
-    init_params = False
-    if args.mode == 'init':
-        init_params = {'auto_acquire': False}
-    elif args.mode == 'acq':
-        init_params = {'auto_acquire': True}
+        warnings.warn(
+            "WARNING: the --num-channels parameter is deprecated, please "
+            "remove from your site-config file",
+            DeprecationWarning,
+        )
+
+    if args.mode is not None:
+        warnings.warn(
+            "WARNING: the --init-mode parameter is deprecated, please "
+            "remove from your site-config file",
+            DeprecationWarning,
+        )

    device_port = None
    if args.port is not None:
        device_port = args.port
    else:  # Tries to find correct USB port automatically
-
        # This exists if udev rules are setup properly for the 240s
-        if os.path.exists('/dev/{}'.format(args.serial_number)):
+        if os.path.exists("/dev/{}".format(args.serial_number)):
            device_port = "/dev/{}".format(args.serial_number)

-        elif os.path.exists('/dev/serial/by-id'):
-            ports = os.listdir('/dev/serial/by-id')
+        elif os.path.exists("/dev/serial/by-id"):
+            ports = os.listdir("/dev/serial/by-id")


All of this reformatting, the replacement of ' with " and the changes in line breaks make me think there was some sort of auto-formatting run. Should we consider additional tooling in our pre-commit config?

Yes... sorry I did this in the PR, but I did run this through Ruff auto-formatting. Its very nice and I think would be a good addition to the pre-commit. Since this is a big rewrite I thought it would be ok to run on this file, but in general it can distract from actual changes, so I think we'd want to run on the whole repo before implementing the pre-commit.

BrianJKoopman · 2024-09-19T17:11:53Z

socs/agents/lakeshore240/agent.py

-        """set_values(channel, sensor=None, auto_range=None, range=None,\
-                current_reversal=None, units=None, enabled=None, name=None)
-
-        **Task** - Set sensor parameters for a Lakeshore240 Channel.
-
-        Args:
-            channel (int):
-                Channel number to set. Valid choices are 1-8.
-            sensor (int, optional):
-                Specifies sensor type.  See
-                :func:`socs.Lakeshore.Lakeshore240.Channel.set_values` for
-                possible types.
-            auto_range (int, optional):
-                Specifies if channel should use autorange. Must be 0 or 1.
-            range (int, optional):
-                Specifies range if auto_range is false. Only settable for NTC
-                RTD.  See
-                :func:`socs.Lakeshore.Lakeshore240.Channel.set_values` for
-                possible ranges.
-            current_reversal (int, optional):
-                Specifies if input current reversal is on or off.
-                Always 0 if input is a diode.
-            units (int, optional):
-                Specifies preferred units parameter, and sets the units for
-                alarm settings.  See
-                :func:`socs.Lakeshore.Lakeshore240.Channel.set_values` for
-                possible units.
-            enabled (int, optional):
-                Sets if channel is enabled.
-            name (str, optional):
-                Sets name of channel.
-


Since task docstrings moved the docs page needs some updates to the API section.

socs/agents/lakeshore240/agent.py

socs/util.py

BrianJKoopman · 2024-09-19T17:43:06Z

socs/util.py



 def get_md5sum(filename):
    m = hashlib.md5()

-    for line in open(filename, 'rb'):
+    for line in open(filename, "rb"):


Unrelated reformatting.

BrianJKoopman · 2024-09-19T18:00:32Z

socs/util.py

I'm not a fan of util.py, it'll just accrues a bunch of miscellaneous stuff. Can we put the new actions stuff in a module called action.py? I imagine this'll move upstream to ocs eventually. (get_md5sum only gets imported by one module, we should frankly move it and get rid of util.py, but not in this PR.)

BrianJKoopman · 2024-09-19T18:01:54Z

socs/util.py

+def is_instanceable(t: Type) -> bool:
+    """
+    Checks if its possible to run isinstance with a specified type. This is
+    needed because older version of python don't let you run this on subscripted
+    generics.
+    """
+    try:
+        isinstance(0, t)
+        return True
+    except Exception:
+        return False
+
+
+def get_param_type(t: Type) -> Optional[Type]:
+    """
+    OCS param type variables require you to be able to run isinstance,
+    which does not work for subscripted generics like Optional in python3.8.
+    This function attempts to convert types to values that will be accepted
+    by ocs_agent.param, unwrapping optional types if we are unable to run
+    isinstance off the bat.
+
+    Other subscripted generics such as List[...] or Dict[...] are not currently
+    supported.
+
+    This function will return the unwrapped type, or None if it fails.
+    """
+    origin_type = get_origin(t)
+
+    # Unwrap possible option type
+    if origin_type == Union:
+        sub_types = get_args(t)
+        if len(sub_types) != 2:
+            return None
+        if type(None) not in sub_types:
+            return None
+        for st in sub_types:
+            if st is not type(None):
+                if is_instanceable(st):
+                    return st
+
+    elif is_instanceable(t):
+        # If this works, then it should work with ocs_agent.param
+        return t
+
+    return None


Let's mark these two functions as private with a leading _. They can always be made public later if needed.

for more information, see https://pre-commit.ci

This reverts commit e703d6c.

jlashner · 2024-10-23T13:58:44Z

Hi Brian,

I went back through and addressed your comments, so if you can give it another look that would be great!

After thinking about the documentation issue, I think that the awkwardness stems from the fact that we are using docs of agent class members and supporting elements as the documentation of the client interface, even when the two don't match perfectly. I think this documentation method enforces pretty strict structural requirements for the agent class, which isn't always necessary or beneficial.

While using the agent method docstrings as the official "agent API" is nice for a lot of agents, I think we should also support agents that are structured differently, and figure out another way to document the API for these cases, even if its just writing it up manually in the docs page.

For now, replaced the LS240Agent autoclass in the docs page with the docs for the Action classes, and in the action class docstrings I put examples for how to run the action through a client. Hopefully this strikes a balance.

Thanks!

jlashner and others added 9 commits December 6, 2023 18:36

LS240 Locking restructure

5ac4f34

[pre-commit.ci] auto fixes from pre-commit.com hooks

0414f92

for more information, see https://pre-commit.ci

"deferred" and action cancellation on main start

a3fc5ca

Merge branch 'ls240-restructure' of github.com:simonsobs/socs into ls…

adf12ff

…240-restructure

Merge in main

4721764

Mypy and type fixes (WIP)

90c1078

[pre-commit.ci] auto fixes from pre-commit.com hooks

1ad091e

for more information, see https://pre-commit.ci

mypy type fixes and reconnect behavior

2e15b9f

Marks main op as degraded when connection fails

cd7fa36

jlashner marked this pull request as ready for review August 14, 2024 01:08

jlashner requested review from mhasself and BrianJKoopman August 14, 2024 01:09

jlashner and others added 7 commits August 13, 2024 21:12

Fix Dict type so it works with oder python

9debd80

Fix 240 integration tests

5e83ca6

[pre-commit.ci] auto fixes from pre-commit.com hooks

3f81892

for more information, see https://pre-commit.ci

flake8 and readthedocs

a523869

Merge branch 'ls240-restructure' of github.com:simonsobs/socs into ls…

bda92d8

…240-restructure

Readthedocs

35fc19c

[pre-commit.ci] auto fixes from pre-commit.com hooks

da7ccb0

for more information, see https://pre-commit.ci

BrianJKoopman mentioned this pull request Aug 15, 2024

Make Agents robust to connection dropouts #721

Open

48 tasks

jlashner and others added 3 commits September 10, 2024 16:27

Move tasks into blocking threads

54854c9

[pre-commit.ci] auto fixes from pre-commit.com hooks

17dfcfe

for more information, see https://pre-commit.ci

Flake 8

74f6f98

BrianJKoopman reviewed Sep 11, 2024

View reviewed changes

jlashner and others added 2 commits September 11, 2024 15:29

Automatic task registration from action

d91f4d0

[pre-commit.ci] auto fixes from pre-commit.com hooks

98c590b

for more information, see https://pre-commit.ci

jlashner added 2 commits September 11, 2024 15:41

Forward declare queue type for older versions of python

5dd70fd

Merge branch 'ls240-restructure' of github.com:simonsobs/socs into ls…

edaf758

…240-restructure

jlashner added 2 commits September 11, 2024 17:17

Unwraps option type in param setter

0fd6841

Moves generic action to task code to socs.util

cc1d1f6

jlashner force-pushed the ls240-restructure branch from bbe219b to cc1d1f6 Compare September 12, 2024 19:39

pre-commit-ci bot and others added 2 commits September 12, 2024 19:39

[pre-commit.ci] auto fixes from pre-commit.com hooks

623dcc7

for more information, see https://pre-commit.ci

removed forced exception used for testing

dae50e9

BrianJKoopman self-requested a review September 19, 2024 18:23

BrianJKoopman requested changes Sep 19, 2024

View reviewed changes

BrianJKoopman added the agent: ls240 label Oct 16, 2024

jlashner added 3 commits October 22, 2024 16:21

Move actions into "actions.py" module + small changes

ec84008

Document agent API in action class docstrings

9b6052a

Remove readthedocs guard for make_logger

e703d6c

jlashner force-pushed the ls240-restructure branch from 6628a6c to e703d6c Compare October 23, 2024 13:37

pre-commit-ci bot and others added 2 commits October 23, 2024 13:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbcc7a3

for more information, see https://pre-commit.ci

Revert "Remove readthedocs guard for make_logger"

99af79f

This reverts commit e703d6c.

BrianJKoopman self-requested a review October 31, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LS240 Locking restructure #589

LS240 Locking restructure #589

jlashner commented Dec 7, 2023

jlashner commented Aug 14, 2024

BrianJKoopman left a comment

BrianJKoopman Sep 4, 2024

jlashner Sep 11, 2024

BrianJKoopman Sep 4, 2024

jlashner Oct 23, 2024

BrianJKoopman Sep 4, 2024

BrianJKoopman Sep 4, 2024

BrianJKoopman Sep 6, 2024

BrianJKoopman Sep 6, 2024

BrianJKoopman Sep 10, 2024

BrianJKoopman Sep 11, 2024

jlashner commented Sep 11, 2024

BrianJKoopman left a comment

BrianJKoopman Sep 18, 2024

BrianJKoopman Sep 18, 2024

jlashner Sep 19, 2024

BrianJKoopman Sep 19, 2024

BrianJKoopman Sep 19, 2024

jlashner Sep 19, 2024

BrianJKoopman Sep 19, 2024

BrianJKoopman Sep 19, 2024

BrianJKoopman Sep 19, 2024

BrianJKoopman Sep 19, 2024

jlashner commented Oct 23, 2024


		return True, 'Lakeshore module initialized.'
		# Register Operaionts

		class Actions:
		"Namespace to hold action classes for the Lakeshore240 agent."

LS240 Locking restructure #589

Are you sure you want to change the base?

LS240 Locking restructure #589

Conversation

jlashner commented Dec 7, 2023

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

jlashner commented Aug 14, 2024

BrianJKoopman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlashner commented Sep 11, 2024

BrianJKoopman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlashner commented Oct 23, 2024