Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opam update that are noop should be instantaneous #5553

Open
Khady opened this issue May 19, 2023 · 9 comments · May be fixed by #6283
Open

opam update that are noop should be instantaneous #5553

Khady opened this issue May 19, 2023 · 9 comments · May be fixed by #6283

Comments

@Khady
Copy link
Contributor

Khady commented May 19, 2023

When an opam update sees that nothing changed in a repository the operation should be instantaneous instead of taking 20s.
Also opam should check headers like last-modified so that it doesn't actually download the index file if it wasn't changed since the last download.

louis@ahrefs-laptop1-popos1:~$ time opam update

<><> Updating package repositories ><><><><><><><><><><><><><><><><><><><><><><>
[default] synchronised from https://opam.ocaml.org
Now run 'opam upgrade' to apply any package updates.

real	0m29.922s
user	0m11.714s
sys	0m9.216s
louis@ahrefs-laptop1-popos1:~$ time opam update --debug
00:00.034  CLI                    Parsing CLI version 2.1
00:00.035  GSTATE                 LOAD-GLOBAL-STATE @ /home/louis/.opam
00:00.035  CLIENT                 UPDATE 
00:00.035  RSTATE                 LOAD-REPOSITORY-STATE @ /home/louis/.opam
00:00.300  CACHE(repository)      Loaded /home/louis/.opam/repo/state-3AF31D16.cache in 0.264s
00:00.300  RSTATE                 Cache found
00:00.300  STATE                  LOAD-SWITCH-STATE @ default
00:00.302  CACHE(installed)       Loaded /home/louis/.opam/default/.opam-switch/packages/cache in 0.001s
00:00.803  STATE                  Switch state loaded in 0.503s

<><> Updating package repositories ><><><><><><><><><><><><><><><><><><><><><><>
00:00.806  PARALLEL               Iterate over 1 task(s) with 3 process(es)
00:00.806  PARALLEL               Starting job 0 (worker 1/3): 0
00:00.806  SYSTEM                 mkdir /tmp/opam-2016727-26b9ac
00:02.395  REPOSITORY             update default from https://opam.ocaml.org
00:02.395  CURL                   pull-repo-update
00:02.395  SYSTEM                 mkdir /tmp/opam-2016727-26b9ac/default.new
00:02.395  SYSTEM                 mkdir /tmp/opam-2016727-f25dcf
00:02.395  PARALLEL               Next task in job 0: /usr/bin/curl --write-out %{http_code}\n --retry 3 --retry-delay 2 --user-agent opam/2.1.4 -L -o /tmp/opam-2016727-f25dcf/index.tar.gz.part -- https://opam.ocaml.org/index.tar.gz
Processing  1/1: [default: http]
00:05.665  PARALLEL               Collected task for job 0 (ret:0)
00:05.672  PARALLEL               Next task in job 0: /usr/bin/tar xfz /tmp/opam-2016727-f25dcf/index.tar.gz -C /tmp/opam-2016727-26b9ac/default.new
00:08.533  PARALLEL               Collected task for job 0 (ret:0)
00:08.533  SYSTEM                 rmdir /tmp/opam-2016727-f25dcf
00:08.790  REPO_BACKEND           diff: /tmp/opam-2016727-26b9ac/{default,default.new}
00:08.790  PARALLEL               Next task in job 0: /usr/bin/diff -ruaN default default.new
00:10.019  PARALLEL               Collected task for job 0 (ret:0)
00:10.019  SYSTEM                 rm /home/louis/.opam/log/patch-2016727-59c89b
00:10.019  SYSTEM                 rmdir /tmp/opam-2016727-26b9ac/default.new
00:15.905  REPOSITORY             update empty, no validation performed
[default] no changes from https://opam.ocaml.org
00:15.905  REPOSITORY             default: applying empty update
00:15.905  PARALLEL               Next task in job 0: /usr/bin/tar cfz /home/louis/.opam/repo/default.tar.gz.tmp -C /tmp/opam-2016727-26b9ac default
Processing  1/1:
00:16.901  PARALLEL               Collected task for job 0 (ret:0)
00:16.901  SYSTEM                 rm /home/louis/.opam/repo/default.tar.gz
00:25.290  PARALLEL               Job 0 finished
00:25.291  FILE(repos-config)     Wrote /home/louis/.opam/repo/repos-config in 0.000s
00:25.291  SYSTEM                 rm /home/louis/.opam/repo/state-3AF31D16.cache
00:25.299  CACHE(repository)      Writing the repository cache to ~/.opam/repo/state-3AF31D16.cache ...
00:26.289  CACHE(repository)      ~/.opam/repo/state-3AF31D16.cache written in 0.990s
00:26.441  SYSTEM                 rmdir /tmp/opam-2016727-26b9ac/default
00:28.076  SYSTEM                 rmdir /tmp/opam-2016727-26b9ac

real	0m28.113s
user	0m11.503s
sys	0m8.367s

In that example I expect the second opam update to take less than 1s

@Khady
Copy link
Contributor Author

Khady commented Apr 8, 2024

I was told that there's a speedup coming in opam 2.2, in which it won't be necessary to decompress and then delete all the files of index.tar.gz. In our case I think that it would already make things about twice faster (compared to opam 2.2~beta1).

Nonetheless, using the last-modified http header would still be a nice addition for us. For the official opam repository it would probably not make a huge difference, as there are release quite often. But it would help for all the repositories that seldom have updates.

$ curl -I https://opam.ocaml.org/index.tar.gz
HTTP/2 200 
server: nginx/1.23.3
date: Mon, 08 Apr 2024 03:23:20 GMT
content-type: application/gzip
content-length: 7125026
accept-ranges: bytes
etag: "sbjicd48ppe"
last-modified: Sat, 06 Apr 2024 21:29:01 GMT

@kit-ty-kate
Copy link
Member

This could be relatively easy to implement using the If-Modified-Since HTTP1.1 header. We could store last-modified in repos-config for each repositories then just repeat the same string to If-Modified-Since. We would only have to handle the special 304 response code.

This way we don't really have to deal with parsing and comparing the date which can become tricky very quickly

@kit-ty-kate kit-ty-kate added this to the 2.4.0~alpha1 milestone Oct 11, 2024
@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 11, 2024

Why don't you simply use etags ?

@kit-ty-kate
Copy link
Member

Why don't you simply use etags ?

What would be the advantage of etags/If-None-Match over last-modified/If-Modified-Since?

So far i prefer the latter because it is meaningful when stored. In the future we can look at that string, parse it and let the user know how long ago they updated (could also be useful for debugging)

@kit-ty-kate
Copy link
Member

kit-ty-kate commented Oct 11, 2024

Replying to myself: it is more precise (a resource could in theory be updated several times in a second).
Even if the typical use of opam is fairly outside this use-case we could use both and let the default priority decide. It wouldn't cost us anything to do that

EDIT: different servers could also have different time if a server is changed or in the case of a cluster (although i'm not sure for the latter)

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 11, 2024

What would be the advantage of etags

In general they are the robust thing to use as far as caching goes. In a distributed system you can't guarantee that your clock are synchronized, in practice clocks go back in time etc. Additionally files sometimes get touched without really changing and modified-since only has second granularity (though it may not matter for opam). Also you don't have to parse these HTTP non-normalized time formats (though doing as you wanted to you wouldn't either).

@dbuenzli
Copy link
Contributor

Additionally files sometimes get touched without really changing

Note that this depends on how the etags are generated (which is out of scope of the HTTP specification). Out of curiosity I went back to what I do which I took from nginx's etagging scheme and it uses the mtime… so in my case a touch does change my etags.

@hannesm
Copy link
Member

hannesm commented Oct 11, 2024

Not sure how feasible this is with nginx and opam.ocaml.org -- but my favourite etag for the index.tar.gz would be the commit id that the tarball refers to.

But indeed, the last-modified should be sufficient from a practical point of view (taking into account that opam.ocaml.org recreation scripts take >2 hours anyways, there's no chance the index.tar.gz will be changed multiple times in the same second).

The only thing I have to add: since opam.ocaml.org is served by multiple computers, please ensure that the same header (last-modified / etag) is served.

@kit-ty-kate
Copy link
Member

i have a first rough draft of this feature request in #6283. It works pretty well to reduce the time taken by noop opam update to under half a second. In its current state however it will still need some work but i'm hoping to have it ready for the 2.4 release planned for April.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants