Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of processors needed for large grid regression tests #708

Closed
cacraigucar opened this issue Nov 30, 2022 · 13 comments
Closed
Assignees

Comments

@cacraigucar
Copy link
Collaborator

Issue Type

Other (please describe below)

Issue Description

From Adam Harrington:

5 successful runs in a row for FCnudged at 720pes. 5 successful runs in a row for F2000climo at 360pes. I'd suggest these pes numbers for var-res cam tests

Will this change answers?

No

Will you be implementing this yourself?

Yes

@cacraigucar
Copy link
Collaborator Author

@adamrher - We don't currently have any F2000Climo tests using CONUS. They are either FCnudged and FCHIST with FHIST coming in with upcoming tag #668. Do you think the FHIST and FCHIST tests will use the same processor counts as F2000climo?

@adamrher
Copy link

ugh, I should've done the tests with FHIST, as this was intended to inform #668. From the CAM side I'm pretty sure it's the same number of files being read in in F2000climo and FHIST. But the land model takes in a massive flanduse_timeseries that might have a larger memory footprint.

Let me run some FHIST conus tests w/ 360pes and I'll report back here.

@cacraigucar
Copy link
Collaborator Author

From @fvitt

I think we should try removing the outfrq9s_refined_camchem testmods dir and use "_P720x1". I would try this:

<test compset="FCnudged" grid="ne0CONUSne30x8_ne0CONUSne30x8_mt12" name="SMS_D_Ln9_Vnuopc_P720x1" testmods="cam/outfrq9s">

@adamrher
Copy link

I need to check out Francis' fork to run FHIST. won't get to this until after 3PM today.

@adamrher
Copy link

adamrher commented Dec 1, 2022

@fvitt can I convince you to merge your #668 branch to the head of the trunk? It is currently at cam6_3_077, and I'm getting the var-res errors that were resolved in cam6_3_079 #666.

@fvitt
Copy link

fvitt commented Dec 1, 2022

@fvitt can I convince you to merge your #668 branch to the head of the trunk? It is currently at cam6_3_077, and I'm getting the var-res errors that were resolved in cam6_3_079 #666.

#668 is now merged up to the head of cam_development

@cacraigucar
Copy link
Collaborator Author

@fvitt and/or @adamrher - Do you think that the number of processors Adam determines for FHIST will also apply to the FCHIST that we are currently running as a regression test for CONUS?

@fvitt
Copy link

fvitt commented Dec 1, 2022

@fvitt and/or @adamrher - Do you think that the number of processors Adam determines for FHIST will also apply to the FCHIST that we are currently running as a regression test for CONUS?

The memory usage of FCHIST is roughly twice the memory used by FHIST. I don't think we should go below 20 cheyenne nodes for FCHIST on the CONUS grid.

@cacraigucar
Copy link
Collaborator Author

@fvitt - thanks for the info. I will set both FCHIST regression tests to use P720x1.

@adamrher
Copy link

adamrher commented Dec 1, 2022

Reporting here that I got 5 completed runs in a row running FHIST w/ 360 processors. So I think we've got a game plan?

@cacraigucar
Copy link
Collaborator Author

Thanks @adamrher for all the runs!

Also, I've change my mind on changing both of the FCHIST to use 720 processors. I will leave the camchem 1 day CONUS run as it is with the higher PE count since the wall clock limit for it was 2 hours.

@adamrher
Copy link

adamrher commented Dec 1, 2022

I can test to see how long a walltime is needed for that test at 720? Ideally we don't want any tests with 50 nodes, right?

@cacraigucar
Copy link
Collaborator Author

@adamrher - I would suggest that you see what @fvitt thinks. camchem are not run routinely, but rather by him on an as needed basis. The aux_cam and prealpha tests are run much more frequently and are the most important ones to streamline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

4 participants