-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2GB limit to sysimage size? #1019
Comments
On my system, this will reproduce the issue with a package right at the 2GB boundary.
|
Okay, so here is what I think is going on in this case. tl;dr - the maximum displacement for a position independent offset is 31 bits in x86_64/amd64. If the target function exceeds that, the displacement offset is computed incorrectly. This appears to be a clang/Xcode//g++ bug. When this crashes, here's the line it crashes on:
Here's the function from objdump in sysimage1.so (the one that doesn't work)
Looking around at other jfptr_init functions in sysimage2.so (the one that does work!) there are a lot of functions that look very similar...
So the lines (which is where the segfault occurs)
should be loading up the address of _jl_pgcstack_func_slot/key_slot. Back to sysimage1.so.
So if we take: 0x199b289 (inst. pointer for next address) and need to get to 0x8395fe40 we need a displacement of 0x81fc4bbe (which is bigger than 31 bits...) Of course, since something is computing this probably with overflow, let's see what happens:
which gives 0x7e03b449 vs. the code which has -0x7e03b449. So it seems like this is a bit of a stumbling block for using larger than 2GB sysimages. Okay, so where does this error come from: The object file shows this:
So it's g++ that doesn't compute the offset correct or throw an error that it can't handle integrating everything into a shared library. I'm looking into if there is an option that would flag this error. |
Curiously, this does work with a 3.2GB sysimage under linux, so it seems it's something macOS specific. |
I suspect this is because the small code model is used when on Mac os but not Linux, see https://github.com/JuliaLang/julia/blob/c897a13c45c1222b4b16cf941348beef25f97ee0/src/aotcompile.cpp#L1889 and a previous issue with the same cause #500 (comment). |
I was looking at that :). Time to recompile Julia to see if that fixes it. |
I tried editing aotcompile.cpp to use the medium code model on the Mac. This runs into a linker error when building the base Julia sysimage... (sys-o.a)
I'll often get errors on different functions (I think it's just erroring on the first one), e.g. I tried something else and got:
This is using a recent clang++
|
Here's the commit where this was changed for linux... |
Yeah, every platform supports a different set of relocations (ELF vs. COFF vs. Mach-O) and LLVM is not always good about complaining up front when you ask it to use a code model that is not fully implemented for a given platform / binary format. You might try using the Large code model instead of Medium to see if that succeeds on macOS. |
Large also fails :(. All the entries in sys-a.o that Julia compiles are generated with pcrel=false; i.e they aren't position independent. I was doing these tests on a compile of 1.11.2 to keep changes minimal. But it seems like LLVM 16 just doesn't support CodeModel medium on macOS with PIC, or there is some black magic I haven't determined yet. On the other hand, moving to Julia master (1.12-dev, LLVM 18) shows that CodeModel medium does work on macOS now. I'm checking if that'll compile the multiGB sysimage. |
cc @gbaraldi sounds like we can potentially turn this on for more platforms, which would be great news It would be nice to find what change upstream fixed this too |
Agree on finding out what changed in llvm to fix it :) I never like solutions where there's a link in the chain I don't understand. In my current test of a large sysimage on 1.12-dev, I ran into a new issue at the linking stage.
That function is definitely in libjulia-internal.dylib:
But it looks like it isn't exported (little t vs. big T)? There seem to be a number of checks for this function, so I'm guessing it shouldn't be called in the shared object? |
I'm guessing JuliaLang/julia#56817 fixed the _jl_fptr_sparam issues as I don't see them anymore. Still checking on what happens with a 2GB+ sysimage (some of the packages seem to break at precompilation for 1.12-dev, so my initial set doesn't quite work there.) |
is there a limit of sysimage files to 2GB?
When I compile a number of packages that increases the sysimage size to over 2GB, then it fails when I try to use it with this error message.
System info:
I'm continuing to look for other sources of where the issue may be, but there seems to be a success/failure threshold at 2GB regardless of what packages I'm adding.
These sysimages were constructed with
PackageCompiler.create_sysimage(pkgs; sysimage_path="FullJuliaSysimage.so)
. (The list of packages is at the bottom, although I don't think this is so relevant... (but it may be!)Full list of packages that I had to trigger the error.
The text was updated successfully, but these errors were encountered: