-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTFX regression with Comonicon CLI on Julia 1.11 #55171
Comments
Oh, and if anyone is looking into this, note that the master branch of SnoopCompile works on Julia 1.11. See this issue for progress on the release: timholy/SnoopCompile.jl#382 |
I tried to replicate this but
If anyone familiar with Comonicon can help I'd be happy to investigate (@timholy mentioned on slack that he wasn't sure why this didn't build |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Hi you can try the example package here: https://github.com/comonicon/Comonicon.jl/tree/main/example/Hello and you can install the entry point by running help?> Hello.comonicon_install
comonicon_install(;kwargs...)
Install the CLI manually. This will use the default configuration in
Comonicon.toml, if it exists. For more detailed reference, please refer to
Comonicon documentation (https://comonicon.org). you don't have to test that with an entry point, you can test directly with its Julia entry point help?> Hello.command_main
No documentation found.
Hello.command_main is a Function.
# 2 methods for generic function "command_main" from Hello:
[1] command_main(ARGS::Vector{String})
@ ~/Code/Julia/Comonicon/src/codegen/julia.jl:90
[2] command_main()
@ ~/Code/Julia/Comonicon/src/codegen/julia.jl:90 |
I get:
And considering both Pkg and REPL is loaded, this does not seem very surprising. |
Hmm, it was much worse for me, I wonder why. Agreed this seems more reasonable. Is it all TTL? There seems to be a path forward that Jameson and I sketched out for significantly reducing TTL, but it will take some time to explore. Therefore, let me ask a heretical question: if Pkg & REPL are the two culprits and they come up a lot, how important for 1.11 is their exclusion from the sysimg? Clearly we want this eventually, but until If "putting them back" is not just a couple of lines, then you can ignore this. |
@timholy whats your platform? WSL2? |
Regarding this specific case, it seems "self-inflicted". The Hello app here depends on the full Comonicon.jl which pulls in a lot of dependencies (Pkg, REPL) to both do argument parsing and to add a |
As of right now, just adding Pkg to the sysimage doesn't work:
which I guess is from https://github.com/JuliaLang/Pkg.jl/blob/da9f89ffc7df973d273a3952051ad656987bc64b/src/PlatformEngines.jl#L20 and that jlls don't work in the process creating a sysimage since |
Including stdlibs shouldn't be expensive, and in my testing it's not on MacOS nor native linux. And from what I hear native windows neither. I suspect this is another edge case where WSL2 (which I believe Tim is using) is so slow (100x slower than native linux) for file system operations that our loading code fs stuff is getting in the way. We can probably replace some |
A profile would probably help. |
In trying to install |
I'm slowly refactoring things in Comonicon to have features like app installation done as a plugin. The package was done in pre-package-image time, so everything was generated as a zero dependency code and injected into the user's code space via a macro. As @KristofferC mentioned, this is no longer necessary and can be done with a separate runtime package to support fancy CLI features (which is what I'm planning). If excluding stdlib is the only reason for slowing it, I'm OK with not fixing it. |
I'm not sure that fully explains the slowdown here - WSL2 has very slow filesystem operations, but only when accessing files across the bridge to the native Windows FS. When accessing files in the Linux filesystem, it should be pretty competitive. Is @timholy keeping his depot on the Windows side? |
Tim provided some redone timing info to me for #55331 and saw faster baseline times than those which were reported here, for unknown reasons. (Note my understanding is that Tim's julia installation and depot resides within the linux filesystem on WSL2) |
I'm able to reproduce this on my machine, although I have to have assertions on to get numbers almost as bad as in the OP:
In case it's helpful, here's an I haven't looked closely enough at the trace yet to know what's dominating here, but we are making ~8x as many syscalls as we used to. |
I'm not surprised the syscalls are (much) more now because we went from opening 1 package to a lot of them. But looking at the logs there are some cases like
which are two identical stat calls for the same file at almost the same time which is probably not needed? |
Yeah, agreed - Scrolling the log doesn't show any particular dominator, just a lot of package loading. This is without #55331 so there's a chance Ian has already cleaned up some of that 🙂 |
The summarized results are also interesting (
So we're dominated by reads, most of which are us opening all of the But in total we're only spending 15-20% of the time doing syscalls ( Taking a (data is in profile.linux-perf.zip), image is taken from https://www.speedscope.app/) |
It's also worth investigating why @KristofferC's results seem to be much better parallelized... For an apples-to-apples test, here's the
|
This is what I am running (on an M1):
|
Avoids immediately successive stat calls for the same path during startup & loading. According to MacOS Instruments this reduces `stat64` calls during `--start=no -e "using Pkg"` from 844 to 672. On MacOS I don't see a speed improvement, but on WSL2 @timholy reported the test from #55171 sees a modest improvement. This PR (10 iterations) ``` tim@diva:~/.julia/bin$ time for i in {1..10}; do ./cli --help &> /dev/null; done real 0m2.999s user 0m3.547s sys 0m6.552s ``` master ``` real 0m3.217s user 0m3.794s sys 0m6.700s ``` 1.11-rc2: ``` real 0m3.746s user 0m4.169s sys 0m6.755s ``` I left the `@debug`s in as they might be useful for similar investigations.
Avoids immediately successive stat calls for the same path during startup & loading. According to MacOS Instruments this reduces `stat64` calls during `--start=no -e "using Pkg"` from 844 to 672. On MacOS I don't see a speed improvement, but on WSL2 @timholy reported the test from JuliaLang#55171 sees a modest improvement. This PR (10 iterations) ``` tim@diva:~/.julia/bin$ time for i in {1..10}; do ./cli --help &> /dev/null; done real 0m2.999s user 0m3.547s sys 0m6.552s ``` master ``` real 0m3.217s user 0m3.794s sys 0m6.700s ``` 1.11-rc2: ``` real 0m3.746s user 0m4.169s sys 0m6.755s ``` I left the `@debug`s in as they might be useful for similar investigations.
Following up on a slack conversation, there is indeed a substantial regression in the responsivity of a CLI application written with Comonicon:
Julia 1.10:
Julia 1.11-rc1:
I haven't really participated in the Julia 1.11 release and don't have time to check myself, but since it's a pretty sizable regression I'm slapping a milestone on it. (I am not personally affected by CLI TTFX, though.)
Here are the specifics of what I tested:
Build with
(CLI) pkg> build
and then test with
The text was updated successfully, but these errors were encountered: