-
Notifications
You must be signed in to change notification settings - Fork 36
HPC stack management
Hang-Lei-NOAA edited this page Oct 29, 2020
·
9 revisions
The NCEPLIBS and associated library software will be managed through the HPC-stack package. The repository will be
located at https://github.com/NOAA-EMC/hpc-stack
HPC-Stack is a hierarchical installation of EMC infrastructure for running their workflows and models. This stack
is maintained by the NCEP libraries group. This will provide you with a hierarchical list of modules and options
that you can then follow to load the specific libraries needed for your work.
The HPC-stack will be the only official library installation tool on the different platforms that NCEP libraries
team supports. The development works performed should follow a series of orders.
## Development
The development work will be conducted in a feature branch or a personal fork. Once the development is finished,
the feature branch must be fully tested by the developer before starting a pull request to merge back to the
develop branch.
## Code review and merge
The pull request must go through the code review process before back to the develop branch. The code review process
includes two criterions. One is gaining approval by at least one code reviewer. The second is the pull request must
pass the CI test before merging into the develop.
## Bug fix
The bug fix to the develop branch needs to go through the procedure to pass the code-review process. Bug fixes
addressing small issues including typo error, link error, documentation changes etc. and major changes associated
with code must be treated in the same way as the major development work. That requires at least one approval from
a code-reviewer and pass the CI test before merging back to the develop branch.
## Version control
The Hpc-stack in the develop branch needs to be tagged and given a new version number, when the accumulated
development works are ready to release. The tagged release version must be frozen and reproducible in
installations. Before the tag is made, fully testing is required to prevent any risk in release. The tag or
called the release version is for the prod use.
The management of installations must consider the major situations in support operations and research. As
Arun suggested three scenarios; prod, dev and test. prod is what is left unchanged. dev is where we
introduce updates, add new libraries or new builds or changes to things. test is the intermediate place
where the build is placed and tested across all our software.
## Development
In the EMC working environment, the works start from distributed development in each group. The feedback
on libraries usually focused on the changes and development in one or a few libraries in a single development
group. E.g the prep group, the changes mainly on the bufr,or sometimes w3emc. Land group may require
changes in landsfcutil or surface I/O. Therefore, the development in libraries is distributed. The hpc-stack
has the function to install the full stack + additional libraries that need to be looked at carefully.
This feature can support the development work.The developed libraries will be used in the test stage to check
the effect of changes on other groups.
The development stack should not be installed in the official area. It can be in a sandbox in a personal area
where the libraries and module files are properly tested. Once these tests are complete then, if there are
updates/changes to the hpc-stack repository, these should be brought back into develop and tagged with a beta tag
## Test
The new version of hpc-stack can be put into the test stage. The testing version of hpc-stack is installed in a
sandbox for general testing, and announced to the developers to test. Based on the user feedback, necessary
fixes can be made to the develop branch. Unless a general agreement from all EMC major applications is gained,
the testing stage cannot pass.
## Official/Prod
Official library stacks should not be changed without general announcement. The stack once created should be read
only, and should be changed only under certain circumstances outlined below, once the testing stage passes. The
applications and modulefiles are under the libs directory. The hpc-stack package and log files are kept in the
src directory.
- The hpc-stack repository has been updated on GitHub: This could be because the build or module file structure of existing libraries has been changed, and/or new libraries have been introduced to the stack. In all these cases the hpc-stack goes through the development and test stages. And once those stages are passed, the develop branch is tagged for a new release. . A new release with a version corresponding to the hpc-stack repository release tag is installed on all the platforms. Once the new release is installed on all platforms a general announcement message is sent to all developers. The old versions in the official area will be removed in due time with adequate warnings. The official installation needs to be set as read only mode to prevent any occasional changes to be made on it.
The steps for creating a new hpc-stack are as follows
Decide on the hpc-stack version (this should match with the repository tag)
HPC_STACK_VERSION=1.3.0 (for example)
Create a top-level directory that will hold the source code and install tree
HPC_STACK_ROOT=<path/to/hpc_stack/>/${HPC_STACK_VERSION}
mkdir -p ${HPC_STACK_ROOT}
cd ${HPC_STACK_ROOT}
Checkout the appropriate hpc-stack repository tag
Manually change string to ${HPC_STACK_VERSION} in setup_modules.sh [Note: This step can be skipped if the version string is changed with the repository tag]
Install the stack
./setup_modules.sh -p ${HPC_STACK_ROOT} -c config/<machine config file>
./build_stack.sh -p ${HPC_STACK_ROOT} -c config/<machine config file> -y config/<yaml file> -m
(Optional). Manually change yaml file to install additional library version(s). [This may involve multiple steps] then
./build_stack.sh -p ${HPC_STACK_ROOT} -c config/<machine config file> -y config/<yaml file> -m
Make the stack read only to avoid accidental changes
chmod -R -w ${HPC_STACK_ROOT}
Scenario 2 - The hpc-stack repository has not been updated on GitHub, but an existing hpc-stack installation needs to be updated. In many cases we do not have changes to the hpc-stack repository but an existing hpc-stack installation needs to be updated to add new library versions that have either been released or made available to EMC as beta snapshots. Conversely when a formal release of a library is provided we need to clean out old beta snapshots. In all of these situations changes need to be made to an existing hpc-stack installation and these changes need to be made carefully after prolonged testing.
The steps for updating an existing hpc-stack installation are as follows:
Decide the hpc-stack version that will be updated
HPC_STACK_VERSION=1.3.0 (for example)
Set the top level directory and maker this temporarily read/write
HPC_STACK_ROOT=<path/to/hpc_stack/>/${HPC_STACK_VERSION}
chmod -R +w ${HPC_STACK_ROOT}
cd {HPC_STACK_ROOT}
Manually change the yaml file to install the requisite libraries (MAKE SURE ALL EXISTING LIBRARY INSTALLATIONS ARE SET TO NO, TO ACCIDENTALLY AVOID REINSTALLING THESE LIBRARIES)
./build_stack.sh -p ${HPC_STACK_ROOT} -c config/<machine config file> -y config/<yaml file> -m
(Optional) Repeat for all library updates
Make the stack read only to avoid accidental changes
chmod -R -w ${HPC_STACK_ROOT}