Skip to content

HPC stack management

Hang-Lei-NOAA edited this page Oct 29, 2020 · 9 revisions

HPC-stack management

The NCEPLIBS and associated library software will be managed through the HPC-stack package. The repository will be 
located at https://github.com/NOAA-EMC/hpc-stack
HPC-Stack is a hierarchical installation of EMC infrastructure for running their workflows and models. This stack 
is maintained by the NCEP libraries group.  This will provide you with a hierarchical list of modules and options 
that you can then follow to load the specific libraries needed for your work.

Session 1: Repository management

The HPC-stack will be the only official library installation tool on the different platforms that NCEP libraries 
team supports. The development works performed should follow a series of orders. 

## Development

The development work will be conducted in a feature branch or a personal fork. Once the development is finished, 
the feature branch must be fully tested by the developer before starting a pull request to merge back to the 
develop branch. 

## Code review and merge

The pull request must go through the code review process before back to the develop branch. The code review process 
includes two criterions. One is gaining approval by at least one code reviewer. The second is the pull request must 
pass the CI test before merging into the develop.

## Bug fix

The bug fix to the develop branch needs to go through the procedure to pass the code-review process. Bug fixes 
addressing small issues including typo error, link error, documentation changes etc. and major changes associated
with code must be treated in the same way as the major development work. That requires at least one approval from
 a code-reviewer and pass the CI test before merging back to the develop branch.

## Version control

The Hpc-stack in the develop branch needs to be tagged and given a new version number, when the accumulated 
development works are ready to release. The tagged release version must be frozen and reproducible in 
installations. Before the tag is made, fully testing is required to prevent any risk in release. The tag or 
called the release version is for the prod use.  

Session 2: Installation management

The management of installations must consider the major situations in support operations and research. As 
Arun suggested three scenarios; prod, dev and test.  prod is what is left unchanged. dev is where we 
introduce updates, add new libraries or new builds or changes to things. test is the intermediate place 
where the build is placed and tested across all our software. 

## Development

In the EMC working environment, the works start from distributed development in each group. The feedback 
on libraries usually focused on the changes and development in one or a few libraries in a single development 
group. E.g the prep group, the changes mainly on the bufr,or sometimes w3emc. Land group may require 
changes in landsfcutil or surface I/O. Therefore, the development in libraries is distributed. The hpc-stack 
has the function to install the full stack + additional libraries that need to be looked at carefully. 
This feature can support the development work.The developed libraries will be used in the test stage to check
 the effect of changes on other groups.

The development stack should not be installed in the official area. It can be in a sandbox in a personal area
 where the libraries and module files are properly tested. Once these tests are complete then, if there are 
updates/changes to the hpc-stack repository, these should be brought back into develop and tagged with a beta tag

## Test

The new version of hpc-stack can be put into the test stage. The testing version of hpc-stack is installed in a 
sandbox for general testing, and announced to the developers to test. Based on the user feedback, necessary 
fixes can be made to the develop branch. Unless a general agreement from all EMC major applications is gained,
 the testing stage cannot pass.

## Official/Prod

Official library stacks should not be changed without general announcement. The stack once created should be read 
only, and should be changed only under certain circumstances outlined below, once the testing stage passes. The 
applications and modulefiles are under the libs directory. The hpc-stack package and log files are kept in the 
src directory.

Scenario 1

- The hpc-stack repository has been updated on GitHub: This could be because the build or module file structure of existing libraries has been changed, and/or new libraries have been introduced to the stack. In all these cases the hpc-stack goes through the development and test stages. And once those stages are passed, the develop branch is tagged for a new release. . A new release with a version corresponding to the hpc-stack repository release tag is installed on all the platforms. Once the new release is installed on all platforms a general announcement message is sent to all developers. The old versions in the official area will be removed in due time  with adequate warnings. The official installation needs to be set as read only mode to prevent any occasional changes to be made on it. 

The steps for creating a new hpc-stack are as follows
Decide on the hpc-stack version (this should match with the repository tag) 
   HPC_STACK_VERSION=1.3.0 (for example)
Create a top-level directory that will hold the source code and install tree
   HPC_STACK_ROOT=<path/to/hpc_stack/>/${HPC_STACK_VERSION}
               mkdir -p ${HPC_STACK_ROOT}
   cd ${HPC_STACK_ROOT}
Checkout the appropriate hpc-stack repository tag
Manually change string to ${HPC_STACK_VERSION} in setup_modules.sh [Note: This step can be skipped if the version string is changed with the repository tag]
Install the stack
    ./setup_modules.sh -p ${HPC_STACK_ROOT} -c config/<machine config file>
    ./build_stack.sh -p ${HPC_STACK_ROOT} -c config/<machine config file> -y      config/<yaml file> -m
(Optional). Manually change yaml file to install additional library version(s). [This may involve multiple steps] then 
    ./build_stack.sh -p ${HPC_STACK_ROOT} -c config/<machine config file> -y      config/<yaml file> -m
Make the stack read only to avoid accidental changes
     chmod -R -w ${HPC_STACK_ROOT}

Scenario 2 - The hpc-stack repository has not been updated on GitHub, but an existing hpc-stack installation needs to be updated. In many cases we do not have changes to the hpc-stack repository but an existing hpc-stack installation needs to be updated to add new library versions that have either been released or made available to EMC as beta snapshots. Conversely when a formal release of a library is provided we need to clean out old beta snapshots. In all of these situations changes need to be made to an existing hpc-stack installation and these changes need to be made carefully after prolonged testing. 

The steps for updating an existing hpc-stack installation are as follows:
Decide the hpc-stack version that will be updated
     HPC_STACK_VERSION=1.3.0 (for example)
Set the top level directory and maker this temporarily read/write
     HPC_STACK_ROOT=<path/to/hpc_stack/>/${HPC_STACK_VERSION}
     chmod -R +w ${HPC_STACK_ROOT}
     cd {HPC_STACK_ROOT}
Manually change the yaml file to install the requisite libraries (MAKE SURE ALL EXISTING LIBRARY INSTALLATIONS ARE SET TO NO, TO ACCIDENTALLY AVOID REINSTALLING THESE LIBRARIES)
     ./build_stack.sh -p ${HPC_STACK_ROOT} -c config/<machine config file> -y      config/<yaml file> -m
(Optional) Repeat for all library updates
Make the stack read only to avoid accidental changes
     chmod -R -w ${HPC_STACK_ROOT}