Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop]: User's Guide Updates #880

Merged
merged 129 commits into from
Aug 30, 2023

Conversation

gspetro-NOAA
Copy link
Collaborator

@gspetro-NOAA gspetro-NOAA commented Aug 8, 2023

DESCRIPTION OF CHANGES:

This PR updates the SRW App documentation to align better with the head of develop. It also reorganizes the documentation to make it more user-friendly. The documentation has been split into four sections:

  1. Background Information (Intro, Technical Overview, Components)
  2. Building, Running, and Testing the SRW App (Quickstart guides, Build/Run chapters, WE2E, Tutorial, VX, AQM chapters)
  3. Customizing the Workflow (Workflow Parameters, I/O, LAM grids, Defining a Wflow, Template Vars)
  4. Reference (FAQ, Glossary, Rocoto Info)

This PR also switches many tables to list-tables, which are easier to maintain because they do not rely on exact spacing of | symbols and other symbols used to construct the table. When workflows or other major details change, reformatting the tables is much easier. The same is true when adjusting table contents for a release; less time can be spent reformatting the tables.

EPIC has separate tickets to edit the ConfigWorkflow/Workflow Parameters chapter and the LAM Grid chapter, so these are not yet up to date (but coming soon!). The Container Quickstart chapter requires an update to the develop branch container, so the instructions do not work as written because the container is currently broken. This chapter will be reexamined ahead of the next release. All other chapters have been updated.

All PNG files were removed from the docs in the reorganization. They are now hosted on the SRW App wiki and linked from there instead.

Type of change

  • This change requires a documentation update

TESTS CONDUCTED:

None required (text-only changes). The documentation builds on my fork's RTD account and can be viewed here: https://srw-ug.readthedocs.io/en/text-ug-updates/.
In general, I have run through instructions for various chapters on Hera or Jet.

DEPENDENCIES:

None.

DOCUMENTATION:

N/A. All docs!

ISSUE:

Will partially resolve Issue #879 . Further PRs to update ConfigWorkflow and LAM grid chapters will also be required.

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain). N/A
  • My changes generate no new warnings
  • New and existing tests pass with my changes N/A
  • Any dependent changes have been merged and published

CONTRIBUTORS (optional):

@mkavulich contributed updates for VX tasks.

@christinaholtNOAA
Copy link
Collaborator

I think only item #2 needed a response. All the others were totally fine as is. I was just curious.

  1. I didn't realize that MET/METPlus are available in spack stack. That's spiffy, and leaves you without anything else to really document if we aren't leaving users to their own devices to get the standalone installation.

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @gspetro-NOAA. I appreciate you taking the time to respond so thoroughly.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review, I didn't have time to go through all of the changes, and most of these are very minor suggestions that are optional.

* Bug fixes since the v2.1.0 release
* Pre-implementation Rapid Refresh Forecast System (RRFS) forecast configurations
* Air Quality Modeling (AQM) capabilities
* Updates to :term:`CCPP` that target the top of the ``main`` branch (which is ahead of CCPP v6.0.0). See :ref:`this page <CCPPUpdates>` for a detailed summary of updates that came in ahead of the v2.1.0 release.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a similar summary of CCPP updates for this release? If so I can work on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich Yes! That would be very helpful! :) If you can provide a Google Doc, I can edit/format it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll plan to have that on my plate. If it's getting close to release time and I still don't have it to you feel free to bug me.

docs/UsersGuide/source/BackgroundInfo/Introduction.rst Outdated Show resolved Hide resolved
docs/UsersGuide/source/BackgroundInfo/Introduction.rst Outdated Show resolved Hide resolved
docs/UsersGuide/source/BackgroundInfo/Introduction.rst Outdated Show resolved Hide resolved
docs/UsersGuide/source/BackgroundInfo/Introduction.rst Outdated Show resolved Hide resolved
Comment on lines +382 to +384
.. note::

On ``JET``, users should add ``PARTITION_DEFAULT: xjet`` and ``PARTITION_FCST: xjet`` to the ``platform:`` section of the ``config.yaml`` file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults from the jet.yaml machine file lists all partitions, not just xjet. Unless there's something I'm missing this note can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, Jet was running into errors when "xjet" wasn't specified. I am trying to figure out if this was only when running containers or in all experiments. Unfortunately, we have a few teammates out on vacation, so I think we might have to make the change (if necessary) in a separate PR when I'm able to get a clearer answer.

Copy link
Collaborator

@mkavulich mkavulich Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I am like 95% sure it's not necessary outside of containers, but it won't cause any serious problems to have this included, it just limits the number of nodes available so will increase queue wait times. Hopefully you can get an answer and just remove this in a future PR once you do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich I checked with @natalie-perlin who is updating the SRW App modulefiles, and even outside of containers, experiments often fail on Jet unless xjet is specified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I've never run into that myself (or maybe I have and didn't realize what was causing the failure!). @natalie-perlin can you open an issue for this? This seems like something we should get to the root cause of since only using xjet would significantly reduce the number of nodes available for testing, and currently the workflow does not contain this partition restriction.

Since xjet does not have the most memory or the most cores per node (https://jetdocs.rdhpcs.noaa.gov/wiki/index.php/Jet_System_Overview), and the fact that it only occasionally fails makes me think that maybe only one or two older partitions is the problem, and maybe it's a memory issue (that could be solved by specifying the necessary memory rather than restricting by partition) or something similar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich - please note this is not a bug, but the differences in architecture. xjet is the newer system, and as far as I remember, the only one that has AVX2 vector instructions that are needed for the UFS. The software stack had to be built with no special instructions and AVX2, so some of the SRW tasks could run on other Jet partitions, too (sjet, vjet, kjet). The weather model needs to go to xjet only.

docs/UsersGuide/source/BuildingRunningTesting/RunSRW.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all my comments 👍

@gspetro-NOAA gspetro-NOAA merged commit d80f7db into ufs-community:develop Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants