-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cmeps to use PIO+PNETCDF for IO in UFS #2347
Comments
@DeniseWorthen I think you mean that the PIO options needs to be added to ufs template files. Right? I just want to clarify. The capability to use different options for PIO is already implemented in CMEPS and CDEPS. |
Yes, exactly. I will clarify the issue description. |
I set up an ATM-OCN-ICE case (C384, 1/4deg) on Gaea-C5. I turned off all history and restart-writing except for CMEPS. To do this for OCN and ICE, I manually over-rode the write-restart logicals in the codes and set them false prior to compiling. I removed the WGC for the ATM and used a layout of 16x24 and did not use threading for the ATM. This gave me 2304 PEs as a max for CMEPS. I made a series of 24 hour runs, with mediator restarts at 3 hour intervals, giving a total of 8 mediator restart writes. I recorded the min/max and mean times for the Using the config variables in ufs.configure, I did 3 sets of runs using 300,600,1200 or 2300 PEs for CMEPS. I set the pio_type to pnetcdf for all runs. One set of runs allowed CMEPS to set all the PIO associated parameters, one set I manually set the For the existing configuration, serial netcdf is used by default. This provides a mean write time for each CMEPS restart of ~2.4s. Using pnetcdf+PIO, best results were found using the subset rearranger at stride=4. Depending on the number of tasks, this results in each CMEPS restart time between ~0.8 and 0.5s for each restart write. See full results here |
Denise, thanks for testing the new parallel writing in CMEPS, the speedup is great (>60%). It might be good to test the feature in higher resolution runs (C768 and C1152). I recall we have problems to use a large number of tasks for CMEPS. |
@junwang-noaa I could test the higher ATM cases, all I need is the ATM input and the layouts to try. |
@DusanJovic-NOAA do you have C768/C1112 ATM only test cases (run directories) generated from G-W? |
I have them on wcoss2 here: /lfs/h2/emc/eib/noscrub/dusan.jovic/ufs/c1152_gw_case/ |
I've grabbed these now and will set up some more testing for CMEPS PIO options. It looks like in these were used to test blocksize changes. I'm assuming I should stick w/ the blocksize=32 settings, right? |
Yes. |
Nothing is moving on Gaea today, but I've been testing adding the config variables to the RT templates. On hercules, it appears that for small PE counts, like in the cpld_control test (CMEPS=144 PEs), using serial netcdf is actually faster than pnetcdf. So I plan on doing some more tests on Gaea at the C384 resolution also using fewer and fewer CMEPS PEs, to see if I can identify the point at which pnetcdf starts to pay off. |
I've been able to get the c768 ATM only case running on Gaea but it is failing at about hour 21. See I'm not sure why it's failing. I compiled on gaea and used the job-card from the low-res RT case, modifying for the task count. All the fix files are pointing to G-W fix file locations on Gaea. I'm seeing
EDIT: Now I see that it was a time-out. |
@DeniseWorthen Can you confirm that the c768 ATM test still failed on gaea? Can you list the changes to turn on PIO_Pnetcdf in CMPES so that it can be tested on wcoss2? |
@junwang-noaa I haven't tried the c768 case recently. What I really need is a canned case for the coupled model that runs on Gaea---I was trying to modify the standalone case. To turn on PnetCDF for CMEPS, add to the
This will create as many io tasks as possible, assuming they are laid out at a stride of 4 across the available processors. |
Description
Currently CMEPS in UFS does not make use of PIO options. Restart (and history) writing is through serial netcdf. CMEPS has an existing capability to write using PIO+pnetcdf, with control of the various PIO options (eg. stride, numiotasks) through configuration.
Solution
Parallel writes for CMEPS should be implemented in UFS through setting the appropriate PIO config options. Scalability testing should be done to determine correct values for the PIO settings.
Alternatives
Related to
See oceanmodeling/CMEPS#1 for an example of this issue arising in the coastal modeling effort.
The text was updated successfully, but these errors were encountered: