-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start doing quarterly(-ish) versioned releases #890
Comments
fv3atm currently has versioned releases based on which production system it is used for (for example GFS) or which public release (ufs-wm or ufs-srw) etc. Who is going to use these quarterly releases? And how? Did anyone request these quarterly releases (NCO or ufs-community)? If so, then they should tell us what version numbering scheme they want to use. Are submodules used by fv3atm going to make release with exactly the same version, with correct commit hashes (atmos_cubed_sphere, ccpp-physics, ccpp-framework and upp)? We must update the corresponding submodules before making a release/tag. Which means we must coordinate that release with code managers of those projects. |
This is part of adding documentation and unit testing to fv3atm. Regular and frequent releases are an agile practice. Is there some reason that fv3atm cannot have a release, with version numbers, like all other software packages do? We do not have to update anything before this release, the point is to get a release before we start doing any testing. Each component should and must have it's own versioning. Just like netcdf-c, HDF5, in fact, every software package you've ever heard of. We have this right now, but we use a hash instead of a version number, and we don't document the release. Instead, we will document a release each time we need to move the ufs_weather_model hash of fv3atm. Yes, releases must be coordinated by code managers. That is the job of code manager. If doing a release takes more than 5 minutes, the code manager is doing it wrong. I will give a presentation on the agile release process... Perhaps this confusion stems from the fact that you are used to a lot of manual or system-level testing before a release. Instead, we will do a release without any such testing. However, the next release after this will start to include unit tests. Eventually we will be releasing fully-tested releases, and then we will run those tests on spack install, so spack and the EMC/NCO install team will be using these releases to work out that solution. If we convince the UFS steering committee not to use submodules, but instead to use libraries, we will be ready to transition fv3atm to a library instead of a subcomponent. |
We 'move the ufs_weather_model hash of fv3atm' every time we make a commit to fv3atm, which is sometimes multiple times a week. Are you suggesting that we make a release with new version number multiple times a week. That could be hundreds of releases a year? What's the purpose of all these releases when ufs-weather-model must still know exact hash for it's fv3atm submodule. |
Why would we do that? That will mean that any change in any of the subcomponents will require a new library build. Who is going to do all those library builds? How? Where are we going to store all these library rebuilds? How is ufs-weather-model going to 'find' all those libraries? |
UPP was previously configured as a library for inline post in the UFS-Weather-Model. The library team required quarterly releases for UPP library installation, which significantly slowed down UPP development to support GFS, RRFS, GEFS, and HAFS, AQM implementations—until Dusan updated the configuration to make UPP a submodule of fv3atm. |
I will put my support behind the move towards the agile development. It's what everyone else does, so we should be there as well. To answer the question on the builds, I think that process is (can be) mostly if not completely automated through CI/CD which Github has capabilities for. For example, once the button is clicked to release then a bunch of actions start working on providing artifacts from the build for the public to use and saves it in github (please correct me if I'm wrong). @edwardhartnett After a "first release" what pieces/methodologies would need to be in place? How would the CM workflow need to change towards Agile. Maybe painting the picture a bit might help? |
Wow, great discussion. @BrianCurtis-NOAA you're correct, I have not fully explained this issue. We have a meeting to discuss this topic week after next; I will add you to the invitation list. I have a slide which explains the release process and why we do it. (And we can meet in person, since I will be in Collage Park for the meeting.) @WenMeng-NOAA I would not expect improvement from making untested code into a library. Without the ability to test, the library brings little benefit. This is why all modern libraries use unit testing. However, it's not clear to me how doing releases can slow other development - a release takes only a few minutes to do. How did this slow other development? Manual testing? Also, there was no time at which the libraries team required quarterly releases. UPP has only done two versioned releases, 10.0.0 and 11.0.0 in May and June of 2022. @DusanJovic-NOAA if we move the hash every week, then indeed we should not do new versions every time. I presented about spack and how the UFS will be installed on WCOSS2, no longer with a set of manually executed git commands, but with spack. So spack will install all libraries, as well as the model and all submodules. |
I still struggle to understand what specific problems removing git submodules and replacing them with externally built libraries for all components is supposed to solve. In my opinion, taking that approach would likely just increase the overall complexity of the project, slow down the development process, and make the program more difficult to build than is necessary. I'm not convinced that the potential benefits, if there are any, would outweigh the drawbacks of such a significant architectural change. |
Specific problems with the submodules:
NOAA has invented a way of distributing software with submodules. It's what everyone is used to, so hard to change, but it less productive than what everyone else does. Submodules are not well supported by cmake and spack, which provide few tools and functions to deal with them, but copious free functionality for libraries. NOAA should not be originating new methods of software distribution. We should be using commercial off-the-shelf tools for free, instead of rolling our own. This is not science, it is mere engineering. |
Suppose developers need to make a change in 3 repositories at the same time, ccpp-physics, dycore and fv3atm, which is not uncommon at all. Instead of just cloning single repo (recursively) and running a build.sh with desired build options in the top-level directory, they would first need to clone ccpp-phyisics repo and build a 'ccpp-physics' library, clone dycore and build a 'dycore library', then clone fv3atm, build a 'fv3atm library' and finally compile the model. Using submodules, developers do not need to do anything, all required components' code is already available in the cloned working tree and building the model executable is a single command, build.sh in the top-level directory. What about upp library or mom6 or cmeps or any other of 25 or so components that do not require changes? Are they going to be built by somebody else? Who? How will developer's clone of ufs-weather-model 'know' that it needs all libraries provided by somebody else, but ccpp-physics, dycore and fv3atm libraries that they just built. What changes will be needed in ufs-weather-model and fv3atm code to link correct user built libraries from arbitrary locations? Using submodules, developers do not need to do anything special for components they will be modifying vs. other (non-modifying) components, everything just works. How many versions of these three libraries (ccpp-physics, dycore and fv3atm) they need to build? Just one or maybe more, for example 34 and 64 bit versions, release and debug versions, gnu and intel. That's already up to 2^3=8 combinations. What if a developer wants to confirm that physics changes work on all supported platforms? Repeat the whole process? Build 8 versions of three libraries on 8 platforms? 838=198 builds? By using submodules and having source code of all components under the same source tree all of these build variations will be automatically compiled 'correctly' based on a single set of cmake flags passed to the top-level build.sh script, developers do not need to worry about using the correct library variation. Many developers work on more than one feature/bug-fix using different versions of ufs-weather-model at a time, which means, potentially, they would need to somehow keep ccpp-physics, dycore and fv3atm libraries (all 192 versions) separated from each other (from another 192 versions), on all platforms. And after a source code change in one source file, have a mechanism of running a minimum rebuild on all of these libraries. By using submodules, they do not have to deal with and keep track of where and how all these hundreds of library versions are built. Suggestion is made that all libraries (current submodules), even the model itself, will be built by spack? Does this mean that all developers now need to learn how to use the spack to compile the model, and more importantly, how to modify spack packages in order to incorporate their changes? What if after making a code change in physics, and rebuilding a physics and fv3 libraries, the build of the ufs-weather-model fails at linking? Does spack support partial rebuilds of all dependent libraries after a single file change? Again, using submodules and not having to think about how component's libraries have to be build will allow scientists to focus on their code, not on learning spack. Let's not expect from scientists to become system administrators. We should ask all developers who are contributing actual code changes and code managers what method they prefer the one described here or the one where all components are submodules cloned under the same source tree, where a single build script controls the build (and rebuild) of the entire program automatically. I'd like to get answers to all these questions, and more importantly see a working proof of concept where all submodules are replaced with externally built libraries, and see how developers are going to use it in everyday work, particularly during the rapid edit, build, run cycle, especially while debugging code changes, and tell us what method they prefer. And then, only if the majority of developers think the 'external libraries' method is much better, only then we should think about convincing the UFS steering committee not to use submodules. |
Thanks for the detailed response. I think we need to work out all of these issues. However, you might ponder the fact that other software developers manage very complex systems without submodules. What others have done, surely NOAA programmers can do. With submodules, NOAA has created a highly coupled system. Then the highly-coupled nature of the system is used as the reason that submodules are needed. Instead, start decoupling the system. I like your suggestion that this be prototyped. Prototyping and demonstrating this is exactly the point of this issue. However, you object to this issue. If you object to every step needed to demonstrate decoupling of this software, you will never see it demonstrated. Which brings around the point of this issue: to start doing releases so we can demonstrate how fv3atm may be decoupled and tested. This will have no impact on your work or the work of any other developer of UFS. So perhaps you can withdraw your objections and let us prototype a better solution? |
Description
Like other software packages, fv3atm should have versioned releases.
Solution
We need to pick a version number to start with, and do a release. Then we need to do quarterly releases thereafter.
Alternatives
The alternative is what we are now doing: unlabeled, undocumented releases via submodule hash.
Testing:
No testing needed to do the first release. In future releases, unit testing will be added.
Dependent PRs:
Required to support documentation and testing efforts.
The text was updated successfully, but these errors were encountered: