Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Add option to specify CNF for LCLS2 hutches. Restart/stop AMI only for LCLS2 Hutches. #216

Merged
merged 5 commits into from
Oct 25, 2024

Conversation

gadorlhiac
Copy link
Contributor

@gadorlhiac gadorlhiac commented Oct 24, 2024

Description

Adds a --cnf option to run_daq_utils.py which is passed to daq_utils.DaqManager. This allows us to indicate which cnf file to use for restarting/stopping the DAQ in LCLS2 hutches.

Currently, this is used for:

  • Restarting AMI processes only in LCLS2 hutches instead of the entire DAQ, and the startami and stopami scripts have been updated accordingly.
  • Allowing for RIX to switch between QRIX and ChemRIX cnfs for restartdaq. restartdaq -C <CNF> will now work for LCLS2 hutches. stopdaq will still work as previously, assuming both CNFs have imported from the same "master/main" cnf.

NOTE: After this PR the startami and stopami scripts will perform the following depending on hutch:

  • startami: Restart AMI1 in LCLS1 hutches AMI2 in LCLS2 hutches
  • stopami: Stop AMI1 in LCLS1 hutches, and AMI2 in LCLS2 hutches

For AMI2 running in LCLS1 hutches a separate script is needed, e.g. as being introduced in #212

NOTE: stopami was not currently working for LCLS1 hutches (startami was.). This PR introduces a minor fix for that.

For starting and stopping AMI in LCLS2 hutches, a new standard has been introduced where each hutch will have a $HUTCH_ami.py cnf file. This cnf inherits only the AMI processes from the main CNF. This allows it to be used to restart AMI independently of the rest of the DAQ processes.

Examples of this cnf are:
RIX:

rix-daq:scripts> ll rix_ami.py
-rw-rw-r-- 1 rixopr xs 103 oct 23 13:43 rix_ami.py
rix-daq:scripts> cat rix_ami.py
from rix import *
from psdaq.slurm.config import Config
config = Config({})
config.extend(procmgr_ami)

TMO:

tmo-daq:scripts> cat tmo_ami.py
from tmo import *
from psdaq.slurm.config import Config
config = Config({})
config.extend(procmgr_ami)

TXI:

txi-daq:scripts> cat txi_ami.py
from txi import *
from psdaq.slurm.config import Config
config = Config({})
config.extend(procmgr_ami)

Motivation and Context

Initially motivated by the need to have a standard way of restarting AMI (and AMI ONLY) across the LCLS2 hutches. This PR's cnf feature also provides a mechanism to allow restarting QRIX and ChemRIX DAQs independently from the same hutch if they are differentiated by different configuration files.

How Has This Been Tested?

Testing startami and stopami in LCLS2 hutches

Tested by starting the DAQ with appropriate paths modified to point to the modified daqutils to ensure that the modified version is called. The specific changes (not included in the PR) are, for example for startami.

--- a/scripts/startami
+++ b/scripts/startami
@@ -39,8 +39,11 @@ if [[ `whoami` != *'opr'* ]]; then
     echo "Please run ami from the operator account!"
     exit
 fi
-
 HUTCH=`get_hutch_name`
+if [ "$(/cds/home/opr/txiopr/scripts/engineering_tools/scripts/daqutils isdaqmgr)" = "true" ]; then
+    /cds/home/opr/txiopr/scripts/engineering_tools/scripts/daqutils --cnf ${HUTCH}_ami.py restartdaq $@
+    exit 0
+fi
 EXPNAME=`get_curr_exp`
 CNFEXT=.cnf

@@ -104,3 +107,4 @@ fi

 echo $ami_path$amicmd
 exec $ami_path$amicmd&

This was needed to prevent picking up the standard/central installation of engineering_tools. There were other alternatives (modifying paths, startup shell scripts for opr scripts etc...)

The checkout of this branch was located at ~txiopr/scripts/engineering_tools for all testing.

This was tested in the LCLS2 hutches running from the $HUTCH-daq machines using the operator account.

Note that the startami scripts pick up different cnf files.

TXI:

txi-daq:scripts> ./startami
DAQ is not running in txi
+ /cds/home/opr/txiopr/git/lcls2_100824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/txi/scripts/txi_ami.py

took 6.5332s. for starting the DAQ
txi-daq:scripts> ./stopami
+ /cds/home/opr/txiopr/git/lcls2_100824/install/bin/daqmgr stop /reg/g/pcds/dist/pds/txi/scripts/txi_ami.py

txi-daq:scripts> ./restartdaq
DAQ is not running in txi
+ /cds/home/opr/txiopr/git/lcls2_100824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/txi/scripts/txi.py

took 2.6325s. for starting the DAQ
txi-daq:scripts> stopdaq
+ /cds/home/opr/txiopr/git/lcls2_100824/install/bin/daqmgr stop /reg/g/pcds/dist/pds/txi/scripts/txi.py

RIX:

rix-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/startami
DAQ is not running in rix
+ /cds/home/opr/rixopr/git/lcls2_101824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/rix/scripts/rix_ami.py

took 3.8405s. for starting the DAQ
rix-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/stopami
+ /cds/home/opr/rixopr/git/lcls2_101824/install/bin/daqmgr stop /reg/g/pcds/dist/pds/rix/scripts/rix_ami.py

rix-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/restartdaq
DAQ is not running in rix
+ /cds/home/opr/rixopr/git/lcls2_101824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/rix/scripts/rix.py

took 2.8847s. for starting the DAQ

TMO:

tmo-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/startami
DAQ is not running in tmo
+ /cds/home/opr/tmoopr/git/lcls2_102224/install/bin/daqmgr restart /reg/g/pcds/dist/pds/tmo/scripts/tmo_ami.py

took 3.4237s. for starting the DAQ
tmo-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/stopami
+ /cds/home/opr/tmoopr/git/lcls2_102224/install/bin/daqmgr stop /reg/g/pcds/dist/pds/tmo/scripts/tmo_ami.py

tmo-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/restartdaq
DAQ is not running in tmo
+ /cds/home/opr/tmoopr/git/lcls2_102224/install/bin/daqmgr restart /reg/g/pcds/dist/pds/tmo/scripts/tmo.py

took 2.9897s. for starting the DAQ

Verifying startami and stopami in LCLS1 hutches

The code affecting LCLS1 hutches was modified to bring it in line with spellcheck requirements. The stopami script was previously not working and now is.

MFX:

mfx-daq:~> ~txiopr/scripts/engineering_tools/scripts/stopami
killing  332
mfx-daq:~> ~txiopr/scripts/engineering_tools/scripts/startami
Do you really intend to restart the ami_client on DAQ is running on mfx-daq? (y/n)y
Restarting the ami_client...
/reg/g/pcds/dist/pds/mfx/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/mfx/scripts/p0.cnf.running' to stop
Current experiment is mfxl1039823
/reg/g/pcds/dist/pds/mfx/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/mfx/scripts/p0.cnf.running' to start
Current experiment is mfxl1039823

XCS:

xcs-daq:~> ~txiopr/scripts/engineering_tools/scripts/stopami
killing  20465
xcs-daq:~> ~txiopr/scripts/engineering_tools/scripts/startami
ldpathmunge: /reg/neh/operator/xcsopr/online/ami_plugins is not a directory
Do you really intend to restart the ami_client on DAQ is running on xcs-daq? (y/n)y
Restarting the ami_client...
/reg/g/pcds/dist/pds/xcs/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/xcs/scripts/p0.cnf.running' to stop
Current experiment is xcsx1015123
/reg/g/pcds/dist/pds/xcs/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/xcs/scripts/p0.cnf.running' to start
Current experiment is xcsx1015123

Verifying restartdaq -C <cnf> and stopdaq

TXI:

txi-daq:scripts> ./restartdaq -C txi_ami.py
DAQ is not running in txi
+ /cds/home/opr/txiopr/git/lcls2_100824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/txi/scripts/txi_ami.py

took 6.6249s. for starting the DAQ

RIX:

NOTE : stopdaq will try and stop rix.py but this is fine since both qrix.py and crix.py are derived from it.

rix-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/restartdaq -C qrix.py
DAQ is not running in rix
+ /cds/home/opr/rixopr/git/lcls2_101824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/rix/scripts/qrix.py
Warning: no qrix_w8_0 found in main_config

took 5.0917s. for starting the DAQ
rix-daq:scripts> ~txiopr/scripts/engineering_tools/scripts/restartdaq -C crix.py
 DAQ is not running in rix
+ /cds/home/opr/rixopr/git/lcls2_101824/install/bin/daqmgr restart /reg/g/pcds/dist/pds/rix/scripts/crix.py

took 3.1730s. for starting the DAQ

(ps-4.6.3) rix-daq:scripts> stopdaq
+ /cds/home/opr/rixopr/git/lcls2_101824/install/bin/daqmgr stop /reg/g/pcds/dist/pds/rix/scripts/rix.py

Verifying restartdaq spellcheck changes haven't affected LCLS1 hutches

MFX:

mfx-daq:~> ~txiopr/scripts/engineering_tools/scripts/restartdaq
DAQ is currently not running
start DAQ on mfx-daq
/reg/g/pcds/dist/pds/mfx/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/mfx/scripts/mfx.cnf' to start
Current experiment is mfxl1039823
and 8.8318 for starting the DAQ
mfx-daq:~> ~txiopr/scripts/engineering_tools/scripts/stopdaq
stop the DAQ from mfx-daq
/reg/g/pcds/dist/pds/mfx/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/mfx/scripts/p0.cnf.running' to stop
Current experiment is mfxl1039823

XCS:

xcs-daq:~> ~txiopr/scripts/engineering_tools/scripts/restartdaq
DAQ is currently not running
start DAQ on xcs-daq
/reg/g/pcds/dist/pds/xcs/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/xcs/scripts/xcs.cnf' to start
Current experiment is xcsx1015123
ERR: no restart message...
and 7.7061 for starting the DAQ
xcs-daq:~> ~txiopr/scripts/engineering_tools/scripts/restartdaq
stop the DAQ on DAQ is running on xcs-daq from xcs-daq
/reg/g/pcds/dist/pds/xcs/current/tools/procmgr/procmgr: using config file '/reg/g/pcds/dist/pds/xcs/scripts/p0.cnf.running' to stop
Current experiment is xcsx1015123

Where Has This Been Documented?

Updated README for behaviour of startami and stopami.

Screenshots (if appropriate):

After stopami in MFX
image

After startami again, in MFX
image

After stopami in XCS
image

After startami again, in XCS
image

@silkenelson silkenelson self-assigned this Oct 25, 2024
@gadorlhiac gadorlhiac marked this pull request as ready for review October 25, 2024 16:22
Copy link
Collaborator

@silkenelson silkenelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@gadorlhiac gadorlhiac merged commit 085f7f3 into pcdshub:master Oct 25, 2024
2 checks passed
@gadorlhiac gadorlhiac deleted the ENH/specify_cnf branch October 25, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants