-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build fails due to missing libpq #213
Comments
re: the static linking error about pqsignal, I asked on their IRC channel (#postgresql on LiberaChat). The suggestion I got was to try passing (*) not even sure this would be a good option as I believe even though both functions are named the same they have different semantics. |
Thinking about this more: I do think advising people to set RUSTFLAGS is less fraught than using LD_LIBRARY_PATH, but it's still pretty sucky because there's no point where we'd be able to remove this workaround until Rust provides a solution for |
This change to pq-sys appears to do the right thing:
If I run this with nightly, I get the right behavior:
If I run it with stable cargo (1.54), I get a warning related to trying to use an unstable feature, and it appears to ignore it, resulting in the broken test binary and failed test:
That's as expected, and no worse than what happens today without the change. I wanted to make sure this wouldn't totally break people using stable cargo that's so old that it doesn't even know about this flag. So I changed the key to something else:
On both stable and nightly, this produces no warnings, a broken binary, and a failed test:
I think this is the best we can hope for. I was a little surprised it worked this way, but the Cargo docs do say that any unrecognized key is treated as metadata. So this will create confusing, useless metadata, but I believe the only consumers are people trying to consume a particular metadata key, so I don't think this is a problem. @luqmana What do you think of this? If this seems reasonable, I think I'll create a PR upstream. In parallel I may create our own fork and update Omicron to use that with a "patch" section in Cargo.toml, just because upstream doesn't seem to be actively maintained. We can remove the "patch" and fork if/when upstream does land it. |
I think that's a pretty targeted and reasonable approach 👍🏾 One thing is, should we use |
This is a good question. I'm inclined to test this on Mac, illumos, and Linux, then add an appropriate |
Yea, that sounds good. Just wanted to make sure we don't introduce a "unknown linker flag" sorta error on major platform it already works on. |
I've filed an issue upstream on pq-sys: sgrif/pq-sys#36 Since upstream does not seem actively maintained, I've created our own fork here: I will put together a PR for Omicron that points us at that fork. |
Well, I think that approach is a bust. See #220 for details. I think at this point the remaining options are:
|
I tried the approach where we add a build.rs to omicron that consumes metadata from the pq-sys build and emits the linker instruction rather than putting the linker instruction into pq-sys. Here's the change to pq-sys (from its "master") to emit the metadata:
However, I found that it did not appear in the omicron build script's environment. And then I found this quote in the docs:
But omicron is not an immediate dependent of pq-sys; diesel is. |
While testing my latest approach, I found the issue fixed by #224. The good news is that the above works if I add a dependency from the omicron crates directly on pq-sys. But it does not address the issue found under #224 because the cargo:link_args instruction is not used for rustdoc tests. I have not determined if that's intentional or a bug. |
I've put up #225 -- but for reasons explained there, I'm putting this approach aside. Next steps:
|
Filed rust-lang/cargo#9895 about the surprising cargo:rustc-link-arg behavior. |
I spent a bit of time understanding the static linking problem. tl;dr: this looks like a PostgreSQL bug plus some undesirable behavior from Rust that doesn't appear to be configurable. The problem looks like this:
This is easily reproduced. I tried it on:
In alll cases, I'm building sgrif/pq-sys with sgrif/pq-sys#28 applied, which is necessary to statically link against PostgreSQL 12 or later. There was some discussion that this might relate to gcc 10's change to Assuming it's true that we need to link against libpgcommon and libpgport (which we understand from sgrif/pq-sys#28 and the related PostgreSQL mailing list posts), it seems like a bug in PostgreSQL that these libraries include overlapping symbols. And they really are different functions with the same name: the one coming from legacy-pqsignal.c is documented to exist solely for old libpq clients and is frozen to match the behavior for those old clients -- it's explicitly different than the one in pqsignal.c, which is used elsewhere. So why are we hitting this issue when other PostgreSQL users aren't reporting it? @jclulow guessed it: when statically linking a native library, Rust uses the --whole-archive linker flag, which causes all symbols in the .a file to be included in the resulting binary. The more common behavior (which presumably other consumers of libpq.a are using) only includes referenced symbols. At least one of these two copies is not exported, and the other is likely not widely used, so they likely wouldn't be included if --whole-archive weren't set. Why does Rust do this? This behavior was introduced in rust-lang/rust#16110. rust-lang/rust#85144 started to remove it, but is still a work in progress. I have not found a way around Rust's behavior here. Although there's recently support for requesting the --whole-archive behavior in other cases, I don't see any way to disable it. |
On the PostgreSQL front: here's what I've learned. First, about the three libraries:
Now, according to the Makefiles, libpgport and libpgcommon generate three different .a files: one for client applications (libpgport.a), one for client shared libraries (libpgport_shlib.a), and one for the backend (libpgport_srv.a). I infer that the first one is for inclusion in applications (like those shipped within PostgreSQL), the second is for inclusion in libpq (or other external libraries), and the last is used when building the PostgreSQL backend. The third one is irrelevant for us (it contains different contents). The only difference I see between the first two is that one is built with As far as I can tell:
This explains why libpq and libpgport both have Most consumers presumably don't use the undocumented @luqmana found something I think is a bit better, which is to link these libraries with Rust's "nobundle" option. The documentation is not very detailed, but it looks like this records the library's name rather than its contents and the final link happens without --whole-archive at the end of the build. (Presumably, this is also recording the path of the library, since otherwise I don't see how this would work.) This seems to work and I'll put up a PR for this shortly. |
tl;dr: On Helios, to build "main" after #192, you should:
pkg install library/postgresql-13
/opt/ooce/bin
is on your PATHRUSTFLAGS="-Clink-args=-R$(pg_config --libdir)"
in the environmentBelow are my notes from debugging this. Much thanks to @rmustacc @wesolows @jclulow @luqmana and @bnaecker for help with debugging this and finding possible workarounds. At the end there's a discussion of different approaches we could take, but my conclusion for now is we should just tell people to set
RUSTFLAGS
.Debugging the problem
On Helios, a full build-and-test from a clean clone currently fails.
Here's an example from @bnaecker:
Since this came in with the recent Diesel change, and libpq is a PostgreSQL-related library, I figured there's probably some new package that's pulling that in. So I looked at
cargo tree
:One of the crates there is
pq-sys
, depended-on by the recently-added "diesel", so that's promising:That's this crate, source here. It's a bindgen-generated binding for libpq. As expected, its Cargo.toml includes
"links = "pq"
:https://github.com/sgrif/pq-sys/blob/3e367d53019a2740054d5dc6946e07931f1fb70b/Cargo.toml#L8
And build.rs uses the rustc-link-lib instruction like you'd expect. Now, the README for pq-sys has notes:
Now on Helios we'd expect the library itself to be delivered by pkg:/ooce/library/postgresql-13. Sure enough, that delivers pg_config as well:
It puts it into /opt/ooce/pgsql-13/bin. @jclulow found with
pkg search link:basename:pg_config
that it also installs a mediated link into /opt/ooce/bin. One of these directories needs to be in your PATH at build time for the pq-sys build to find it. If we do set PATH accordingly, then the library is correctly found at build time and the build succeeds. But the binaries fail at runtime instead:Usually if ld.so can't find a library like this, the library is NEEDED (e.g.,
ldd
would show it) but not found in the standard locations or the RUNPATH. Sure enough, ldd shows that it's needed and not found:elfdump -d
shows details:We can see that the RUNPATH/RPATH does not include the directory containing libpq. What directory is this? The build script gets it with:
It seems like the cargo build did not update the RUNPATH. It seems like it should do that, right?
I've spent quite a while over the last two days trying to figure out how this is supposed to work in Cargo and my conclusion is that it doesn't. The clearest evidence is rust-lang/cargo#5077. It's pretty clear that a lot of people have wanted to be able to do this for some time, but it hasn't been implemented.
It should be noted that although this only affects Oxide employees running on illumos, the problem is not illumos-specific at all -- it only has to do with libraries installed in a directory that's not on ld.so.1's default search path.
Failed solutions / workarounds
Codegen option "rpath"
There's a Rust codgen option
-C rpath
, which is documented to "enable rpath". It's not clear what this means. I did rebuild withRUSTFLAGS='-C rpath=yes'
, and I got the same failure mode:though the runpath was different (arguably more broken):
It's not clear why these directories got added, but they're certainly not right for us here. We abandoned this approach.
Cargo's rustc-link-arg instruction
Keith pointed out that Cargo has grown a feature that could help us:
We could use this to pass
-R
to the linker. pq-sys could emitcargo:rustc-link-arg=-Wl,R,/the/library/path
or maybe we do this in Omicron. (The-Wl,
prefix is what you pass to the compiler driver to pass the rest of it to the linker.) This is somewhat platform-specific (it probably won't work on Windows, for example).The docs imply that this will be in 1.56. Current stable is 1.54. So this won't be available in stable Cargo for a few months (!). Now, Omicron is on "nightly", so we could use it there. But where would we put this directive? I think it would have to go into pq-sys, whose consumers are not necessarily on "nightly". I abandoned this path at this point, though we could create our own fork of pq-sys or else try to upstream a change that uses this flag conditional on it being available.
Statically linking libpq
pq-sys can statically link against libpq. That would obviate the need to get the RPATH of the binary right. To test this out, I switched over to a build of "pq-sys" directly, rather than trying to do this inside "omicron".
If we build with PQ_LIB_STATIC, we get a whole bunch of linker errors for undefined symbols:
These symbols appear to be in a combination of OpenSSL and libpqcommon. In the static build case, pq-sys only picks up libpq, not libpqcommon:
https://github.com/sgrif/pq-sys/blob/master/build.rs#L51-L54
It was around this time I decided to see if there were known issues here in pq-sys. Sure enough, there are two PRs that seem relevant:
openssl
somehow. sgrif/pq-sys#25, PR Add feature for including openssl-sys sgrif/pq-sys#29I have a local clone of pq-sys where I've pulled in both of these changes, but now it's failing with:
The symbol "pqsignal" appears to be in both static objects and the linker's unhappy. I have not dug further yet.
Solutions and workarounds that work
Regardless of anything else, you need to have libpq installed and
pg_config
must be on your PATH. On OmniOS and Helios, this is provided by "pkg:/ooce/library/postgresql-13" and you'll want/opt/ooce/bin
on your PATH.Then you can do any of these things:
RUSTFLAGS="-Clink-args=-R$(pg_config --libdir)"
in the environment. This assumes that pq-sys will usepg_config
to find PostgreSQL. If you've got pkg-config set up, or you've overridden PQ_LIB_DIR or something like that, this will definitely do the wrong thing.LD_LIBRARY_PATH=/opt/ooce/pgsql-13/lib/amd64
. It's generally not recommended to set LD_LIBRARY_PATH as a matter of course, partly because it affects everything. Needing to do this should be considered a bug.elfedit
to update the RPATH of the produced binaries. This breaks the usualcargo run
andcargo test
workflow -- you'd have to always build first, then use elfedit, thencargo run
orcargo test
.Explicitly setting RUSTFLAGS in the environment seems like the best option to me, so my plan will be to put that in the README.
The only other avenues would be:
The text was updated successfully, but these errors were encountered: