Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve startup speed #821

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

marckhouzam
Copy link
Contributor

@marckhouzam marckhouzam commented Oct 4, 2024

There is currently an issue with this change where the config.yaml file does not get cleaned up of "contexts" after running tanzu cluster version with cluster:v0.25.4. I will investigate when time allows

Unit tests are not added yet. I wanted to get opinions first.

What this PR does / why we need it

This PR avoids writing to the config file on startup if not necessary.

Writing to the config file to update the server list has a cost for each context. I ended up with 27 contexts configured, which triggered writing to file over 50 times which added 0.65 seconds on my M1 Mac, making the startup jump from around 0.35 seconds to a full second. Having the CLI take a full second to execute any command implies that each call to shell completion takes at least 1 second, which is slow for the user.

Updating the server list does not actually need to be done all the time. Instead, this commit checks if there is actually any missing servers by comparing the list of contexts with the list of servers, and only if there are missing servers will it trigger the update.

This reduces the execution time for commands that don't access the backend; this means that shell completion for commands and flags becomes noticeably more responsive when a user has multiple contexts configured..

Which issue(s) this PR fixes

Fixes #823

Describe testing done for PR

Before the PR:

# On commit e7127fcf5
$ time tz __complete ''>/dev/null
Completion ended with directive: ShellCompDirectiveNoFileComp
tanzu __complete '' > /dev/null  0.85s user 0.08s system 90% cpu 1.025 total

With this PR:

$ time tz __complete ''>/dev/null
Completion ended with directive: ShellCompDirectiveNoFileComp
tanzu __complete '' > /dev/null  0.14s user 0.03s system 43% cpu 0.378 total

I also ran some normal shell completion down the command tree, and the responsiveness is noticeably better.

Functional tests about contexts and servers to be added

Release note

Improve shell completion responsiveness

Additional information

Special notes for your reviewer

Copy link
Contributor

@vuil vuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall optimization is quite clean and much appreciated, thanks for looking into this.

I have a very tiny concern if the skipping of the SetContext() has any unintended consequences, and if we want to cover our bases.

needsSync := doServersNeedUpdate(cfg)
if !hasChanged && !needsSync {
return nil
}

// Now write the context to the configuration file. This will also create any missing server for its corresponding context
for _, c := range cfg.KnownContexts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization is quite clean and much appreciated.

I am not sure if we had in the past rely on this to update existing (not create new) entry to correct for any inconsistencies. cc @prkalle
If we want to be really safe, we can consider some undocumented env var to restore back the old behavior for any user who needs it.

Copy link
Contributor Author

@marckhouzam marckhouzam Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the #616 and what was being done before, I see that we were calling https://github.com/vmware-tanzu/tanzu-plugin-runtime/blob/7278b2554915a10ad3abc9e8befe0a3eb196e1a3/config/legacy_clientconfig_factory.go#L21. I based my check on that code to know if there was any servers needing to be updated. So I feel we are covered on that front.

However, there is another aspect to consider. When saving the configuration to file I notice that there is some cleanup of "legacyDiscardedContexts" done here: https://github.com/vmware-tanzu/tanzu-plugin-runtime/blob/7278b2554915a10ad3abc9e8befe0a3eb196e1a3/config/config_factory.go#L144-L152

To reproduce the situation I installed cluster:k8s:v0.25.4 and simply ran tanzu cluster version. What I see is that the config.yaml file now has a contexts node with all the servers included there. Before this PR, that contexts node would be removed on the next CLI run. With this PR it is only cleaned up whenever the configuration is changed later on.

I don't think keeping this contexts node is problematic, but let's discuss it

@marckhouzam marckhouzam changed the title Improve startup speed by 65% Improve startup speed Oct 8, 2024
Writing to the config file to update the server list has a small cost.
However, since this is done repeatedly for each configured context, this
cost can accumulate when users have a lot of contexts.  In my case, with
27 contexts (not realistic but I was testing) the slow down increased by
about 0.65 seconds on my M1 Mac, making the startup jump from around
0.35 seconds to 1 second. This may be due to repeatedly writing to the
configuration files.

The synchronization of the servers is done for every single CLI command
(since it is in an init()), which means that if there is a slow down, it
affects every single CLI command, even printing help and shell completion.

Although doing this synchronization in an init() is probably overkill,
it would be acceptable if actually writing to the config files was
limited to when it was necessary.  This is because updating the server
list does not actually need to be done all the time but only if there is
a discrepency.

Tis commit checks if there is actually any missing servers by comparing
the list of contexts with the list of servers, and only if there are
missing servers will it trigger the update.

Signed-off-by: Marc Khouzam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inefficient startup
2 participants