Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce cost of large variant matrix #5392

Merged
merged 10 commits into from
Sep 18, 2024
Merged

Conversation

minrk
Copy link
Contributor

@minrk minrk commented Jun 28, 2024

Description

when variant matrix is large and mostly unused (as in conda-forge), the length of input_variants may be several thousand (13,824 in the case of petsc4py) when only a few are actually used.

This causes get_loop_vars and metadata.copy() to become very expensive and dominate render time.

This reduction cuts time spent in render_recipe for petsc4py from over 2 minutes to 40 seconds to produce 72 actual variants:

before:

Screenshot 2024-06-28 at 13 43 28

after:

Screenshot 2024-06-28 at 13 43 50

(result is unchanged)

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@minrk minrk requested a review from a team as a code owner June 28, 2024 11:45
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Jun 28, 2024
when variant matrix is large and mostly unused (as in conda-forge),
the length of input_variants may be several thousand
when only a few are actually used.

This causes `get_loop_vars` and `metadata.copy()` to become very expensive.
Copy link

codspeed-hq bot commented Jun 28, 2024

CodSpeed Performance Report

Merging #5392 will improve performances by ×2.6

Comparing minrk:reduce_variants (311e48b) with main (433f048)

Summary

⚡ 1 improvements
✅ 3 untouched benchmarks

Benchmarks breakdown

Benchmark main minrk:reduce_variants Change
test_render_recipe 64.7 s 25.1 s ×2.6

@minrk
Copy link
Contributor Author

minrk commented Jun 28, 2024

need to investigate why the conda-build tests produce different results with actual runs of conda-build, all of which seem to produce the right variants.

@minrk
Copy link
Contributor Author

minrk commented Jun 28, 2024

seems to be the exclusion of get_used_vars doesn't actually cover all used variables. And some tests appear to expect matrix entries for explicitly unused variables (bzip2 in test_setting_condarc_vars_with_env_var_expansion), which makes me think that perhaps this should be a conda-smithy thing and not a conda-build thing.

@beeankha
Copy link
Member

pre-commit.ci autofix

@minrk minrk marked this pull request as draft June 28, 2024 18:55
@minrk

This comment was marked as outdated.

vastly reduces the number of copies computed for large variant matrices
@minrk minrk changed the title discard unused variants before copying metadata reduce cost of large unused variant matrix Jul 1, 2024
minrk added 2 commits July 1, 2024 11:45
rather than computing all loop vars and then intersecting,
only consider relevant keys when computing loop vars

reduces get_used_loop_vars from O(n_vars * n_variants) to O(n_used_vars * n_variants)
config.copy already copies this, no need to do it twice in metadata.copy
@@ -2394,7 +2394,6 @@ def validate_features(self):
def copy(self: Self) -> MetaData:
new = copy.copy(self)
new.config = self.config.copy()
new.config.variant = copy.deepcopy(self.config.variant)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.copy on the line before already does exactly this, no need to do it twice

used_vars = self.get_used_vars(
force_top_level=force_top_level, force_global=force_global
)
return set(loop_vars).intersection(used_vars)
return self.get_loop_vars(subset=used_vars)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_loop_vars is far cheaper if we pass a subset to consider instead of computing the (usually quite small) intersection after looping over all variables across all variants.

@minrk
Copy link
Contributor Author

minrk commented Jul 1, 2024

I've taken a different approach that doesn't modify the variants list at all, so shouldn't have any consequences besides performance. Instead of reducing the actual variants list, I've reduced the cost of the two dominant operations on the large variant list:

  • defer copying metadata.config.variants for each top_level variant until after it's been filtered, so far fewer variant dicts are copied (each variant should be copied exactly once, I believe, rather than once per top_level loop)
  • in get_used_loop_vars, compute used_vars first and pass it to get_loop_vars so only the used subset are considered, rather than comparing all keys across all variants every time.

The first changes the variants copy from O(top level variants * input_variants) to O(top_level_variants * per_top_level_variants), which is the same as O(input_variants). So the reduction is equal to the number of top level variants; in the case of petsc4py on linux-64, that's 72 * 13,824 = 995,328, reduced to 72 * 192 = 13,824, a reduction of 98.6%. In the case of conda-forge, the real used number of variants is 72, so there's still a further 99.5% that the original proposal reduced by, but I don't know how to do that safely while maintaining all the unspecified guarantees in conda-build.

The second changes the get_loop_vars call from O(variant keys * input variants) to O(used vars * input variants), in the case of petsc4py that's ~365 -> 14, a reduction of 96%:

The savings aren't quite what they are for actually reducing the variants list because there are still some operations on the full list but it still cuts render time in half.

@minrk minrk marked this pull request as ready for review July 1, 2024 10:29
@minrk
Copy link
Contributor Author

minrk commented Jul 1, 2024

I'm afraid I don't understand the mac failures or how they could be related to this PR. The same tests pass just fine on my mac. Hopefully something transient?

@minrk minrk changed the title reduce cost of large unused variant matrix reduce cost of large variant matrix Jul 1, 2024
@beckermr
Copy link
Contributor

beckermr commented Jul 2, 2024

Those mac failures do look unrelated. Let's try to rerun before we dig into them.

@beckermr
Copy link
Contributor

beckermr commented Jul 2, 2024

@mbargull Can you take a look at this one? I don't see any obvious problems but this work gets a bit into the guts of the code.

@minrk
Copy link
Contributor Author

minrk commented Sep 13, 2024

Anyone have a chance to look at this?

conda_build/render.py Outdated Show resolved Hide resolved
to avoid calling pickle in too many places
beckermr
beckermr previously approved these changes Sep 14, 2024
@beckermr
Copy link
Contributor

beckermr commented Sep 17, 2024

@isuruf @beeankha @kenodegard Can we merge this one for the 24.9 release?

news/5392-variant-copy Outdated Show resolved Hide resolved
@beeankha beeankha changed the base branch from main to 24.9.x September 17, 2024 17:45
beeankha
beeankha previously approved these changes Sep 17, 2024
beeankha
beeankha previously approved these changes Sep 17, 2024
@beeankha beeankha merged commit 1ba5760 into conda:24.9.x Sep 18, 2024
28 checks passed
@minrk minrk deleted the reduce_variants branch September 18, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

6 participants