Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split bathymetry subroutines into seperate file #978

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

anton-seaice
Copy link
Contributor

As a precursor to #807, we should split ice_grid into several files, as its >4000 lines.

This PR is for comment on the approach to do this with least interruptions, as several groups have their own copies or changes in their CICE forks. If this change looks ok, I will do the same for other sections of the ice_grid file (e.g. make an ice_gridbox.F90, ice_grid_average.F90 etc)

The changes won't change the interface to the ice_grid module, nor change the functionality of any routines.

PR checklist

  • Short (1 sentence) summary of your PR:
    Move bathymetry routines into ice_grid_bathy.F90 file
  • Developer(s):
    @anton-seaice
  • Suggest PR reviewers from list in the column to the right.
    @apcraig @phil-blain @daveh150
  • Please copy the PR test results link or provide a summary of testing completed below.
    None yet
  • How much do the PR code changes differ from the unmodified code?
    • bit for bit
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?
    • Yes
    • No
  • Does this PR update the Icepack submodule? If so, the Icepack submodule must point to a hash on Icepack's main branch.
    • Yes
    • No
  • Does this PR add any new test cases?
    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.
    Move bathymetry routines into seperate ice_grid_bathy.F90 file to improve readability and maintainability

@anton-seaice
Copy link
Contributor Author

If you have any general thoughts on this @apcraig - that would be great. Its just marked as draft because we could add more (related) changes in this PR.

@apcraig
Copy link
Contributor

apcraig commented Oct 7, 2024

I like this idea in general, but there are potential challenges.

  • As you point out, if we break this up, it may take some effort for users with local changes to bring in their mods. That's not a huge concern for me, I suspect there aren't a lot of users with those changes.
  • The creation of a new directory could create problems for coupled model build systems. For instance, if we just created new files in the existing infrastructure directory, then the model will probably build divrectly. But by adding the new grid directory, coupled model build systems may need to be updated. Is that how we want to proceed? Should we keep all the new grid files in the existing infrastructure directory?

Before we do anything, I think we need to scope out how this is going to look and have a plan.

  • Getting the bathy out in a separate file seems fine.
  • Getting the average routines out in a separate file seems fine including maybe something like neighbor_min, neighbor_max. Maybe there are other generic routines too.
  • Where will the public module grid data live? For the various gridbox, popgrid, rectgrid, etc, those assume the public data is available in the module (or could be via use statement). If we split things up such that we need a separate module to hold the grid data, that seems somewhat inconvenient. What will be the hierarchy of the data and methods? Could we make the grid information less friendly by splitting it up? I think we need to be a little careful. Pulling out grid infrastructure makes sense to me. I'm less clear about pulling out methods that operate on the grid data itself.

@anton-seaice
Copy link
Contributor Author

I like this idea in general, but there are potential challenges.

As you point out, if we break this up, it may take some effort for users with local changes to bring in their mods. That's not a huge concern for me, I suspect there aren't a lot of users with those changes.

The change becomes notable simpler (and possible neater) if we change the interface to ice_grid, so maybe this is ok.

The creation of a new directory could create problems for coupled model build systems. For instance, if we just created new files in the existing infrastructure directory, then the model will probably build divrectly. But by adding the new grid directory, coupled model build systems may need to be updated. Is that how we want to proceed? Should we keep all the new grid files in the existing infrastructure directory?

I think its fairly minor for build systems to change the included files (we do this all the time :))

Where will the public module grid data live? For the various gridbox, popgrid, rectgrid, etc, those assume the public data is available in the module (or could be via use statement). If we split things up such that we need a separate module to hold the grid data, that seems somewhat inconvenient. What will be the hierarchy of the data and methods? Could we make the grid information less friendly by splitting it up? I think we need to be a little careful. Pulling out grid infrastructure makes sense to me. I'm less clear about pulling out methods that operate on the grid data itself.

Yeah - I messed around with this for a while and this seems like the tricky part.

The limitation of the current set-up is that its not possible to move the subroutines in the current ice_grid module into a new module because the new module cannot both use ice_grid and be used in ice_grid (this is a circular dependency).

The options i've come up with so-far are:

  1. split functions and data into seperate modules :
    This might look like seperate ice_grid_data and ice_grid modules. ice_grid_data would contain public arrays of all the grid variables, and ice_grid would have or use methods which manipulate the data. This option touches more code outside of the ice_grid module but might be the neatest.

    (There is a permutation on this where all the variable in ice_grid_data get imported through a module use and then are made public again in ice_grid, which doesn't change the interface).

  2. leave the public part of ice_grid unchanged and use many arguments:
    This would mean leaving the interfaces to ice_grid unchanged, and moving the methods into new modules (as suggested in this PR for ice_grid_bathy). This gets messy, as many of the subroutines use many of the arrays in ice_grid. To move popgrid_nc to a new module, would mean passing many arguments, i.e.:

call popgrid_nc(save_ghte_ghtn, grid_file, kmt_file, &
               dxT, dyT, dxU, dyU, dxN, dyN, dxE, dyE, HTE, HTN, &
               ULON, ULAT, TLON, TLAT, ANGLE, ANGLET, &
               hm, kmt, lont_bounds, latt_bounds, G_HTE, G_HTN)  

I don't think there is a performance problem with this? But it is messy!

  1. leave the public part of ice_grid unchanged and create new grid types:
    I think we could create a new type to bundle related data together, e.g.:
type :: grid_lengths
          real (kind=dbl_kind), dimension (:,:,:), pointer :: &
                   dxT    , & ! width of T-cell through the middle (m)
                   dyT    , & ! height of T-cell through the middle (m)
                   dxU    , & ! width of U-cell through the middle (m)
                   dyU    , & ! height of U-cell through the middle (m)
                   dxN    , & ! width of N-cell through the middle (m)
                   dyN    , & ! height of N-cell through the middle (m)
                   dxE    , & ! width of E-cell through the middle (m)
                   dyE    , & ! height of E-cell through the middle (m)
                   HTE    , & ! length of eastern edge of T-cell (m)
                   HTN         ! length of northern edge of T-cell (m)

And then point an instance of this type to right variables. The subroutines calls become somewhat neater, e.g.

call popgrid_nc(save_ghte_ghtn, grid_file, kmt_file, &
               grid_lengths, grid_latlons, grid_angles, grid_mask
               , G_HTE, G_HTN)  

This option is probably quite confusing.

@apcraig
Copy link
Contributor

apcraig commented Oct 8, 2024

The circular logic is the problem. I think the question is whether any of the three proposals above (and others we might think about) is better than what we have now. If we move data out of ice_grid and all the use statements elsewhere in the code have to change, that's quite a change too.

What if make the requirements that we have to keep the public data in ice_grid and we don't want to add a datatype or a bunch of arguments. That would limit what we could do. We'd have to keep the data and higher level methods in ice_grid. How much of the implementation could we extract into a grid_infrastructure layer and could we create new methods that could be used by multiple high/mid level methods that would improve code reuse? Would that be useful?

Is the motivation that we would like each type of grid to have it's own module? But the "drivers" are still calling init_grid1, init_grid2, etc. This seems like a lot of work and change to create this separation at the middle layer just to create individual files for different grid files/types. There is also a lot of reuse in init_grid1 and init_grid2. So we'd want to create a popgrid, latlongrid, cpomgrid and rectgrid file/module? Then add a momgrid file/module?

      if (trim(grid_type) == 'displaced_pole' .or. &
          trim(grid_type) == 'tripole' .or. &
          trim(grid_type) == 'regional'      ) then
         if (trim(grid_format) == 'nc') then
            call popgrid_nc     ! read POP grid lengths from nc file
         else
            call popgrid        ! read POP grid lengths directly
         endif
#ifdef CESMCOUPLED
      elseif (trim(grid_type) == 'latlon') then
         call latlongrid        ! lat lon grid for sequential CESM (CAM mode)
         return
#endif
      elseif (trim(grid_type) == 'cpom_grid') then
         call cpomgrid          ! cpom model orca1 type grid
      else
         call rectgrid          ! regular rectangular grid
      endif

We have often run across exactly these types of issues when trying to clean up some code. We end up having a peeling onion effect and the code cleanup ends up being a much bigger deal than it should be. And unless we are refactoring a bunch of code to make it perform better or add new features that require a refactor, we end up deciding to leave things more-or-less as they are. I'm open, but want to make sure we consider the cost vs benefit.

@anton-seaice
Copy link
Contributor Author

If we move data out of ice_grid and all the use statements elsewhere in the code have to change, that's quite a change too.

I think this is the best approach.

Yes - there are 47 files impacted but the changes are small to each individual file. (And fairly low risk - its just changing the name of the module).

What if make the requirements that we have to keep the public data in ice_grid and we don't want to add a datatype or a bunch of arguments. That would limit what we could do. We'd have to keep the data and higher level methods in ice_grid. How much of the implementation could we extract into a grid_infrastructure layer and could we create new methods that could be used by multiple high/mid level methods that would improve code reuse? Would that be useful?

This would be very limited in what we can do. Basically all the subroutines are in this file because they use the data in ice_grid, and are called in another ice_grid function.

Possibly we could put the grid_average_X2Y subroutines/interface in a different file (it would need new arguments for the masks sometimes). (And the change in this PR would be achievable too)

Is the motivation that we would like each type of grid to have it's own module?

This would be good but is a significant re-design and doesn't really fit into the current structure. My main motivation is to make the ~4000 lines in the current file easier to handle / read / work with.

There is also a lot of reuse in init_grid1 and init_grid2. So we'd want to create a popgrid, latlongrid, cpomgrid and rectgrid file/module? Then add a momgrid file/module?

I think seperating into seperate modules by grid_format makes sense - e.g. netcdf, binary, idealised/rectangular, mom_netcdf - that's basically the same as your suggestion. This is only ~1500 lines of the total file though.

We have often run across exactly these types of issues when trying to clean up some code...

On this note - can we deprecate some of the grid_types ?

e.g. binary, CPOM, latlongrid ?

@apcraig
Copy link
Contributor

apcraig commented Oct 8, 2024

I don't think we can deprecate grid_types. Maybe CPOM, but I think the other grid types are in use. I think we run into issues if we separate by grid types as you suggest. For instance, the pop grid can be binary or netcdf. We don't want to duplicate a bunch of the grid initialization in two files where the only difference is how the file is read. I could see having a pop, mom, latlon, etc module but I'm not sure we should separate by file format.

Another idea would be to keep the data in module ice_grid and have that be the "data" module. Then we could move the current public methods into a separate module called something like ice_grid_methods. Then the use statement only has to change for CICE subroutines that use the methods, not the data. We could then add some middle layers called something like ice_grid_popgrid, ice_grid_momgrid, ice_grid_othergrids and also some infrastructure files like ice_grid_bathy and ice_grid_average if that would work. Personally, I'm not convinced this is the way to go overall. I'm not that troubled by the ice_grid file as it stands and I worry that if things get split apart, we add some risk that a change in one file for one grid breaks another grid implementation. If it's all in one file, we sort of force it to be "one".

The other problem is that how we define the grid isn't even as simple as pop/mom/latlon OR binary/netcdf. We use terms like 'displaced_pole', 'tripole', 'latlon'. What if MOM has a tripole grid or a latlon grid? It's almost like we need to refactor how to specify the grid to 'pop', 'mom', 'latlon', 'internal' and then add information about whether it has special characteristics like tripole and then the format of the file (which we already have separated). I think we need to think carefully how we want to specify, define, and separate the grids. The refactor strategy should be driven by an effort to clean up and improve the implementation, not by the notion that we are trying to split up a big file. What are the shortcomings of the current implementation (beyond the file being big and difficult to deal with) that we should fix? Where does that lead us in redesign?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants