Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chicoma debug #859

Merged
merged 2 commits into from
Sep 13, 2024
Merged

Chicoma debug #859

merged 2 commits into from
Sep 13, 2024

Conversation

andrewdnolan
Copy link
Contributor

@andrewdnolan andrewdnolan commented Sep 12, 2024

This PR adds a reservation option to the job section of the config files, which is needed to run jobs in the debug queue on chicoma.

While setting qos=debug used to work on chicoma, something switched in the last couple months and both reservation and partition need to set to debug for things to work. See the chicoma docs for reference:
https://hpc.lanl.gov/policies/job-scheduling-policies.html#JobSchedulingPolicies-Chicoma
(Note: the entry in the table is for a interactive debug job, but the need to set both reservation and partition applies to batch jobs aswell).

Checklist

  • User's Guide has been updated
  • Developer's Guide has been updated
  • Documentation has been built locally and changes look as expected
  • Document (in a comment titled Testing in this PR) any testing that was used to verify the changes

@andrewdnolan
Copy link
Contributor Author

andrewdnolan commented Sep 12, 2024

Testing

Tested in: /lustre/scratch5/anolan/debug_reservation

compass suite -c landice -t full_integration -s -w regular_queue
compass suite -c landice -t full_integration -s -w debug_queue -f config.cfg 

pushd regular_queue && sbatch job_script.full_integration.sh && popd
pushd debug_queue  && sbatch job_script.full_integration.sh && popd

where config.cfg:

[job]

qos =
partition = debug
reservation = debug

Debug Queue Jobscript:

#!/bin/bash
#SBATCH  --job-name=compass_full_integration
#SBATCH  --nodes=1
#SBATCH  --output=compass_full_integration.o%j
#SBATCH  --exclusive
#SBATCH  --time=1:00:00
#SBATCH  --reservation=debug
#SBATCH  --partition=debug

source load_compass_env.sh

compass run full_integration

Regular Queue Jobscript:

#!/bin/bash
#SBATCH  --job-name=compass_full_integration
#SBATCH  --nodes=1
#SBATCH  --output=compass_full_integration.o%j
#SBATCH  --exclusive
#SBATCH  --time=1:00:00
#SBATCH  --qos=standard
#SBATCH  --partition=standard

source load_compass_env.sh

compass run full_integration

Debug Queue Results:

Test Runtimes:
00:08 PASS landice_dome_2000m_sia_restart_test
00:03 PASS landice_dome_2000m_sia_decomposition_test
00:04 PASS landice_dome_variable_resolution_sia_restart_test
00:02 PASS landice_dome_variable_resolution_sia_decomposition_test
00:21 PASS landice_enthalpy_benchmark_A
00:20 PASS landice_eismint2_decomposition_test
00:16 PASS landice_eismint2_enthalpy_decomposition_test
00:17 PASS landice_eismint2_restart_test
00:18 PASS landice_eismint2_enthalpy_restart_test
00:09 PASS landice_greenland_sia_restart_test
00:08 PASS landice_greenland_sia_decomposition_test
00:18 PASS landice_hydro_radial_restart_test
00:11 PASS landice_hydro_radial_decomposition_test
00:32 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro
00:29 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro
00:19 PASS landice_dome_2000m_fo_decomposition_test
00:14 PASS landice_dome_2000m_fo_restart_test
00:10 PASS landice_dome_variable_resolution_fo_decomposition_test
00:10 PASS landice_dome_variable_resolution_fo_restart_test
00:23 PASS landice_circular_shelf_decomposition_test
00:53 PASS landice_greenland_fo_decomposition_test
00:46 PASS landice_greenland_fo_restart_test
00:24 PASS landice_thwaites_fo_decomposition_test
00:29 PASS landice_thwaites_fo_restart_test
00:18 PASS landice_thwaites_fo-depthInt_decomposition_test
00:16 PASS landice_thwaites_fo-depthInt_restart_test
00:39 PASS landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
00:23 PASS landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 09:02
PASS: All passed successfully!

Regular Queue Results:

Test Runtimes:
00:08 PASS landice_dome_2000m_sia_restart_test
00:03 PASS landice_dome_2000m_sia_decomposition_test
00:04 PASS landice_dome_variable_resolution_sia_restart_test
00:02 PASS landice_dome_variable_resolution_sia_decomposition_test
00:21 PASS landice_enthalpy_benchmark_A
00:15 PASS landice_eismint2_decomposition_test
00:16 PASS landice_eismint2_enthalpy_decomposition_test
00:16 PASS landice_eismint2_restart_test
00:15 PASS landice_eismint2_enthalpy_restart_test
00:08 PASS landice_greenland_sia_restart_test
00:07 PASS landice_greenland_sia_decomposition_test
00:18 PASS landice_hydro_radial_restart_test
00:11 PASS landice_hydro_radial_decomposition_test
00:32 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro
00:28 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro
00:18 PASS landice_dome_2000m_fo_decomposition_test
00:14 PASS landice_dome_2000m_fo_restart_test
00:10 PASS landice_dome_variable_resolution_fo_decomposition_test
00:10 PASS landice_dome_variable_resolution_fo_restart_test
00:14 PASS landice_circular_shelf_decomposition_test
00:50 PASS landice_greenland_fo_decomposition_test
00:44 PASS landice_greenland_fo_restart_test
00:29 PASS landice_thwaites_fo_decomposition_test
00:33 PASS landice_thwaites_fo_restart_test
00:14 PASS landice_thwaites_fo-depthInt_decomposition_test
00:19 PASS landice_thwaites_fo-depthInt_restart_test
00:40 PASS landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
00:21 PASS landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 08:45
PASS: All passed successfully

@andrewdnolan
Copy link
Contributor Author

@xylar I think pretty similar changes could apply to polaris as well. I'd be happy to open up a PR there as well to address this in both places.

Copy link
Collaborator

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewdnolan, thanks for adding this! It looks good to me. I hadn't figured out a reasonable way to handle the need to change two job config options in tandem and this seems like a good one.

@xylar xylar merged commit 869dafe into MPAS-Dev:main Sep 13, 2024
5 checks passed
@andrewdnolan andrewdnolan deleted the chicoma_debug branch September 13, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants