Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker container, add Docker auto-build and test, update documentation for building with spack-stack #466

Merged
merged 34 commits into from
May 22, 2024

Conversation

mkavulich
Copy link
Collaborator

This PR has several updates that improve SCM documentation, overhaul the docker container, updates new and existing tests, and adds modulefiles for building on MacOS and generic Linux.

Documentation

Documentation for this branch is built here: https://ccpp-scm-mkavulich.readthedocs.io/en/feature-macos_and_docker_build/

This update adds some information about testing to the Repository chapter (though more details should probably be included), and completely overhauls the Quick Start chapter.

Among the changes to the Quick Start chapter are:

  • Removing some less relevant text, and rearranging and updating some wording to improve flow of the chapter (this is supposed to be a "quick" start after all)
  • Replace non-standard section delimiters per Sphinx documentation
  • Include more information about spack-stack, including more links
  • Include instructions for loading environment on pre-configured platforms, as well as other procedures for building spack-stack or setting environment variables manually on other platforms such as MacOS and Linux
  • Update Python section for appropriate prerequisites
  • Use Sphinx built-in Notes markup to highlight important points
  • Update run commands, add more examples
  • Update Docker section with relevant changes (see next section)

Docker

  • The Dockerfile is updated to work with the latest CCPP code. The old "common community container" is no longer used, as it is no longer supported.
  • A github action is introduce that runs on each commit to main. This action will build the container and push it to Dockerhub under the tag "dtcenter/ccpp-scm:latest". This will ensure there is always an up-to-date, working container for the top of the main branch.

Testing

  • A new CI test will build and run the regressiont tests with the Docker container when a new pull request is created
  • Test ci_build_scm_ubuntu_22.04_nvidia.yml is modified to only run on pull_requests and when manually requested

Other

  • Add new "generic platform" modulefiles macos_clang and linux_gnu for use with spack-stack installations.

This PR will resolve #409 and partially resolve #455 (will still need a release container when the time comes)

@mkavulich
Copy link
Collaborator Author

Note: the new job to auto-push the latest container to Dockerhub makes use of repository secrets: I have provided my dockerhub credentials as DOCKERHUB_USERNAME and DOCKERHUB_TOKEN since I can write to the dtcenter organization on Dockerhub. If in the future I am not around, or if I lose access to that Dockerhub org, those secrets will have to be updated.

Copy link
Collaborator

@dustinswales dustinswales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes.
I just wonder if we should rip out the NCEPlibs stuff now from the documentation, or wait until that PR gets into the physics?

scm/doc/TechGuide/chap_quick.rst Outdated Show resolved Hide resolved
scm/doc/TechGuide/chap_quick.rst Show resolved Hide resolved
scm/doc/TechGuide/chap_quick.rst Show resolved Hide resolved

.. code:: bash

./run_scm.py -f ../../test/rt_test_cases.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we need to discuss are the SCM RTs.
rt_test_cases.py needs to be extended to

  • include the v7 tests, which we need to decide on. Same SDF/case combos as in v6 plus some new cases?
    (Note we have a separate CI test that use the DEPHY repository, but there aren't any comparisons in that test, it just ensures the SCM works with the DEPHY repo. When I set this up I didn't do a deep evaluation on the case from the DEPHY repo, just more of a technical test that teh SCM could use the latest and greatest DEPHY cases. @bluefinweiwei has used the SCM with cases in the DEPHY repo for T&E, so I think it makes more sense to use a case that she is confident in. And then add a comparison step to the DEPHY CI test.
    (Note We have a separate CI test for the UFS case generation. This test just creates the UFS case files from staged UFS RT data. It doesn't run the SCM with that case. We need to either expand this test to run the SCM and compare to baselines, OR leave this CI test as is and add a staged UFS case to the SCM, to be included in rt_test_cases.py?)
  • move the "supported v6" into the "unsupported" section. As long as these still work, we should include them in our testing suite.
  • Beef up the comparison step in the CI RT script to create plots of fields that differ, and upload these as a github artifact.
  • We only run the CI tests with GNU. Intel compilers aren't available on the guthub servers we do the CI, but we could create a Intel Docker container. TBH, for the RT CI test, we don't need to build the software stack as part of the test since ci_build_scm_ubuntu_22.04.yaml already does this, we just need to run the SCM and compare to baselines.

@@ -103,7 +103,7 @@ This release bundle has some known limitations:

- As of this release, using the SCM over a land point with an LSM is
possible through the use of UFS initial conditions (see
:numref:`Section %s <UFS ICs>`). However, advective forcing terms
:numref:`Section %s <UFSreplay>`). However, advective forcing terms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW. We are going to need to change all instances of "replay" in the documentation to "case generation".
There will be a subsequent PR with those code changes, so no need to address this at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is currently broken (not caught in the last PR) so this just updates the link to point to the correct place.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this file Dockerfile_GNU or Dockerfile_GNUv12? Just in case we get an Intel one soon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; do you think there is a possibility that Intel might be doable? Last time I worked with Docker (a few years ago) Intel was very difficult to get working in containers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JCSDA uses Intel in Docker all the time. The only issue is that you can't distribute the containers to the public.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay that's good to know. If we do implement containers for testing we'll need a place to push them other than Dockerhub anyway, so it sounds like this may work.

@mkavulich
Copy link
Collaborator Author

@grantfirl Did you want a chance to review this or can I go ahead and merge it?

Copy link
Collaborator

@dustinswales dustinswales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. A minor suggestion with the OS version in the CI scripts. And we will need to revisit documentation later anyways.


jobs:
docker:
runs-on: ubuntu-latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to "ubuntu-22.04", the latest OS can change and cause hiccups.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to revisit the documentation later anyways, at which point we can remove any deprecated dependencies.

pre-configured computing environment with a pre-compiled model. This is
also an avenue for running this software with a Windows PC. See section
:numref:`Section %s <docker>` for more information.
SCM. We provide instructions on building the code from scratch (:numref:`Section %s <obtaining_code>`), as well as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortened because this is a "quick" start guide? Otherwise, this might be new information from someone unfamiliar with CCPP, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's the idea, I wanted to trim down the less relevant bits in this chapter to make it a "quicker" start. The "Intro" chapter already covers this at a high level, I could expand that intro with this info about submodules specifically if you think that's needed.

For working with the development branches (stability not guaranteed),
check out the ``main`` branch of the repository:
Developers seeking to contribute code to the SCM or CCPP will need to use the most up-to-date
version of the code, which can be found on the ``main`` branch of the repository:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point, I think that the link contained basic information on git usage. Perhaps it isn't needed anymore. It also probably had links to our other docs, e.g. sci-doc and tech-doc. Those resources are still relevant for some SCM users, I'd imagine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM. I see that it was just moved. Scrolling is hard. :)

needed for specifying the location of the various prerequisites. Users will need to set variables for the
compilers (``CC``, ``CXX``, ``FC``), as well as the root directories for the library installs of NetCDF (``NetCDF_ROOT``),
``bacio`` (``bacio_ROOT``), ``sp`` (``sp_ROOT``), and ``w3emc`` (``w3emc_ROOT``). This is the procedure used in the
provided Dockerfile in ``ccpp-scm/docker/``, so users can reference that file for guidance on how to install this software
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that you reference the "manual" approach used in the Dockerfile. That's fine. Disregard the comment above re: spack-stack.

``thompson_tables.tar.gz`` and ``MG_INCCN_data.tar.gz`` and extract
to ``scm/data/physics_input_data/``.

New with the SCM v7 release, static data is available for running cases with GOCART climatological aerosols (where the value of ``iaer`` in the ``&gfs_physics_nml`` namelist starts with 1; see the `CCPP Scientific Documentation <https://dtcenter.ucar.edu/GMTB/v6.0.0/sci_doc/_c_c_p_psuite_nml_desp.html>`__ for more information); one example of this is with the default namelist settings for the GFS_v17_HR3 scheme. This dataset is very large (~12 GB), so it is recommended only to download it if you will be using it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Windows users may need to omit the curly braces around environment variables: use ``$OUT_DIR``
instead of ``${OUT_DIR}``.

For running through all supported cases and suites, use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rt_test_cases.py has more than just the permutations of supported cases/suites, but that is probably not a big deal.

@grantfirl
Copy link
Collaborator

@mkavulich Can you update to the latest main for good measure before I press the merge button?

mkavulich added 20 commits May 22, 2024 12:04
 - remove a lot of fluff to get to the details quicker
 - Add example commands for installing spack-stack on MacOS
 - Update "multi-run" command for Docker

Now updated through "4.3. Compiling SCM with CCPP"
 - Move MacOS example out to an external link (Github Discussion)
 - Move downloading of static data after build
 - Convert "NOTE" asides into proper Sphinx Note blocks
 - More details about debug mode
 - Remove yet another forum mention, how are these still out there!?
 - Update SCM run commands, including some examples
 - Remove the deprecated "multirun" argument
 - Update argument descriptions
 - Update description of Docker software stack
… This job shouldn't need those credentials anyway now that I separated out the push step, so removing the Login step should fix this
@mkavulich mkavulich force-pushed the feature/macos_and_docker_build branch from 97ee4b3 to 10c8f05 Compare May 22, 2024 18:06
@mkavulich
Copy link
Collaborator Author

@grantfirl Rebased my branch on the latest main

@grantfirl grantfirl merged commit 046559b into NCAR:main May 22, 2024
18 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants