Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark on crates.io index #44

Open
mpizenberg opened this issue Oct 18, 2020 · 6 comments
Open

Benchmark on crates.io index #44

mpizenberg opened this issue Oct 18, 2020 · 6 comments

Comments

@mpizenberg
Copy link
Member

It is a bit far from now, but one thing that would be great to optimize "real-world" performances would be to use "real-world" packages registries. For example, we could build the index of crates.io and make a benchmark that would solve dependencies for the last version of every crate. It would take into account the performances of both the pubgrub implementation and a dependency retriever implementation.

Solving dependencies of crates.io is not possible for the time being though since we don't handle all the possibilities that cargo enables. But we could probably simplify the index in ways that enables benchmarking even if we don't support everything yet. Features could for example be removed from the index. We could consider only versions below 1.0 if we don't want to handle multiple major. Remove packages using pre-releases or replacements etc.

Alternatively, we could start with simpler indexes from other programming languages. I think for example that we could already solve elm packages. I don't know about others.

Ultimate test would probably be solving dependencies of all NPM packages ahah. That could be a blog post XD.

@Eh2406
Copy link
Member

Eh2406 commented Oct 27, 2020

I think python dependencies do not need any unsupported features pre-release and atypical ranges. But traditionally you need to run setup.py to get the dependencies. 🤦 What could go wrong with a turing complete way to specify dependencies?

Anyway conda is a better package manager for python, and it does have an index file.
conda index link 4.8M
conda-forge index link 56.6 MB
and the dependency syntax.
and a blog post about why there dependency resolution is so slow. Apparently there solver is a pluggable component so if things go really well...

@mpizenberg
Copy link
Member Author

Registry of all elm packages since elm 0.14, which represents 11079 distinct package versions + 11 elm versions:
elm-packages.zip (contains a 4.1Mb elm-packages.ron file)

I have done some rudimentary cleaning but not tried to solve dependencies yet, might need some additional cleaning.

@Eh2406
Copy link
Member

Eh2406 commented Nov 15, 2020

That is wonderful! I started by using dev and looping over all versions and running the SAT test against it. All passed!
Next I commented out the SAT part to see how long it takes just to solve all packages, on my laptop: 2.5 sec. (Benchmarking is hard, but don't clone dependency_provider for each version. And run in release!)

@mpizenberg
Copy link
Member Author

Benchmarking is hard

loved the edit history ahah

This was referenced Nov 16, 2020
@Eh2406
Copy link
Member

Eh2406 commented Nov 20, 2020

While working on solving all packages from crates.io, I noted that there are some that go slow. I thought it may be interesting to look into one of them on its own. For example "zuse" took a while, so I made a ron file from just its transitive dependencies
zuse_str_SemanticVersion.zip it is 689 KB unzipped and resolving all the "packages" in it takes over 3sec.

@Eh2406
Copy link
Member

Eh2406 commented Mar 18, 2024

#104 has made "zuse" less interesting. I have been using a "hobo" example for a while, but it was very slow to run with our benchmarking tools because of synthetic packages. Recent improvements to my importing code have reduced the need for the synthetic packages. So here is a new import of "hobo": hobo2_str_SemanticVersion.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants