Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas to improve performance #153

Open
billsacks opened this issue Oct 16, 2020 · 2 comments
Open

Ideas to improve performance #153

billsacks opened this issue Oct 16, 2020 · 2 comments

Comments

@billsacks
Copy link
Member

As CESM has grown to include more externals, I've found that the performance of rerunning checkout_externals in an existing directory can sometimes feel slow. There are probably a number of optimizations that could be made, but here are some ideas that come to mind. Both of these ideas apply to externals that point to a hash or tag (not a branch). Furthermore, for a tag, these ideas rely on git's documented behavior, which says that the version of a tag in your local repository will not be changed if you fetch from a remote and the tag has been redefined on that remote. (A couple of years ago, I did some testing of this behavior, and I think I found that some (old??) versions of git overwrote your local tag with the tag on the remote. But I would personally be happy to design manage_externals around that documented git behavior.)

  1. If an external is already checked out at the desired hash / tag, do nothing with that external. Currently, I think that the full checkout procedure is run on every external even if the external is already on the desired hash / tag.

  2. If the external is not at the desired hash / tag, but that hash / tag does already exist locally, then avoiding doing the git fetch. This comes up for me a fair amount in practice, because I find myself switching back and forth between newer and older versions of CESM in a given CESM checkout, though I don't know how often it would come up for typical users. Currently, we do a git fetch on every external before checking out the desired hash / tag. I found that, with this one-line change (which clearly is NOT sufficient to implement this change: this was just a hack):

diff --git a/manage_externals/manic/repository_git.py b/manage_externals/manic/repository_git.py
index f986051..bc7f40e 100644
--- a/manage_externals/manic/repository_git.py
+++ b/manage_externals/manic/repository_git.py
@@ -354,7 +354,6 @@ def _checkout_external_ref(self, verbosity, submodules):
         if not remote_name:
             remote_name = self._create_remote_name()
             self._git_remote_add(remote_name, self._url)
-        self._git_fetch(remote_name)
 
         # NOTE(bja, 2018-03) we need to send separate ref and remote
         # name to check_for_vaild_ref, but the combined name to

the time it took to run checkout_externals without making any changes to any of my CESM externals files (with the desired tags already checked out in all repos) dropped from about 40 sec to about 30 sec.

I'm curious what others think about these proposed changes. I don't have time to implement them any time soon, but wanted to open this discussion.

@DavidHuber-NOAA
Copy link

DavidHuber-NOAA commented Jun 8, 2023

To tack on to this, I would like to suggest the option to run git clone, git fetch, and git submodule with multiple cores via the -j # flag, noting that this is only an option for non-ancient versions of git (version 2.8+).

@ekluzek
Copy link

ekluzek commented Jun 8, 2023

If you add something that's version dependent should we add a check for version in? You are right this is a feature that's been around for so long that I'm not sure it matters. But it's also nice to have version checking to both know what the minimum requirements are as well as to give helpful error messages "you just need to update your git version to X...".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants