-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openmpi: fix build on darwin #332983
openmpi: fix build on darwin #332983
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing and fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not working for me on aarch64-darwin:
openmpi> *** Configuring PMIx
openmpi> configure: WARNING: yes/no are invalid responses for --with-pmix-libdir. Please specify a path.
openmpi> configure: error: Cannot continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just about to push this commit instead of this one:
commit 56ad7bfea46efa13ad5429f8f2581c36e7819ef0
Author: Doron Behar <[email protected]>
Date: Wed Aug 7 17:09:27 2024 +0300
openmpi: also don't use --without-pmix-libdir, like (ofi-libdir)
Fixes the configurePhase on Dawrin, see:
https://github.com/NixOS/nixpkgs/pull/332983#pullrequestreview-2225271302
diff --git a/pkgs/development/libraries/openmpi/default.nix b/pkgs/development/libraries/openmpi/default.nix
index 38080e151f58..86d4670ed102 100644
--- a/pkgs/development/libraries/openmpi/default.nix
+++ b/pkgs/development/libraries/openmpi/default.nix
@@ -117,7 +117,6 @@ stdenv.mkDerivation (finalAttrs: {
(lib.enableFeature fortranSupport "mpi-fortran")
(lib.withFeatureAs stdenv.isLinux "libnl" (lib.getDev libnl))
"--with-pmix=${if stdenv.isLinux then (lib.getDev pmix) else "internal"}"
- (lib.withFeatureAs stdenv.isLinux "pmix-libdir" "${lib.getLib pmix}/lib")
# Puts a "default OMPI_PRTERUN" value to mpirun / mpiexec executables
(lib.withFeatureAs stdenv.isLinux "prrte" (lib.getBin prrte))
(lib.withFeature enableSGE "sge")
@@ -129,9 +128,11 @@ stdenv.mkDerivation (finalAttrs: {
(lib.enableFeature cudaSupport "dlopen")
(lib.withFeatureAs fabricSupport "psm2" (lib.getDev libpsm2))
(lib.withFeatureAs fabricSupport "ofi" (lib.getDev libfabric))
- # The flag --without-ofi-libdir is not supported from some reason, so we
- # don't use lib.withFeatureAs
- ] ++ lib.optionals fabricSupport [ "--with-ofi-libdir=${lib.getLib libfabric}/lib" ];
+ # Any --without-*-libdir flag is not supported from some reason, so we
+ # don't use lib.withFeatureAs for these flags.
+ ]
+ ++ lib.optionals stdenv.isLinux [ "--with-pmix-libdir=${lib.getLib pmix}/lib" ]
+ ++ lib.optionals fabricSupport [ "--with-ofi-libdir=${lib.getLib libfabric}/lib" ];
enableParallelBuilding = true;
] ++ lib.optional fabricSupport "--with-ofi-libdir=${lib.getLib libfabric}/lib" | ||
# The pmix flags should only be set when pmix is used | ||
++ lib.optionals stdenv.isLinux [ | ||
"--with-pmix=${lib.getDev pmix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this change - you still want to set --with-pmix=internal
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to set this at all on darwin? "internal" should be the default if it can be built on a given platform. Otherwise it should not be built at all. Note, that pmix is feature to simplify startup of a large number process over a multiple machines. Unless you have a large cluster of darwin machines (managed by slurm or some other wlm) pmix is not of much use anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to set this at all on darwin? "internal" should be the default if it can be built on a given platform. Otherwise it should not be built at all.
Wait, are you saying that there is a difference between using --with-pmix=internal
and not using at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the help text from configure:
--with-pmix(=DIR) Build pmix support. DIR can take one of three
values: "internal", "external", or a valid directory
name. "internal" forces Open MPI to use its internal
copy of pmix. "external" forces Open MPI to use an
external installation of pmix. Supplying a valid
directory name also forces Open MPI to use an
external installation of pmix, and adds DIR/include,
DIR/lib, and DIR/lib64 to the search path for
headers and libraries. Note that Open MPI no longer
supports --without-pmix. If no argument is
specified, Open MPI will search default locations
for pmix and fall back to an internal version if one
is not found.
--with-pmix-libdir=DIR Search for pmix libraries in DIR. Should only be
used if an external copy of pmix is being used.
I would not request to build pmix at all on non-Linux, but let openmpi's internal defaults decide if it wants to build it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not request to build pmix at all on non-Linux, but let openmpi's internal defaults decide if it wants to build it or not.
Usually in packages with many such build options that I maintain, I prefer to track upstream's default behavior and mimic it in the Nix functional behavior ("a function always returns something" sort of). What's weird to me, is that there shouldn't be a significant difference between the bundled pmix and the one we package. Perhaps we should just try to enable pmix
for all platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should just try to enable pmix for all platforms?
What would be the benefit? pmix requires tight integration with a workload manager. That is already difficult enough to achieve on Linux. Why would we add the extra maintenance effort if it is a niche (if any at all) application on non-Linux.
The benefit would be that you'd handle all platforms consistently. If there will be a bug in the future with pmix
only on 1 platform, you'd notice that and be able to focus on that when building pmix
, and not when building a openmpi
. Also, it'd be easier to apply patches to pmix
when it is not a submodule. Note how they simply use a git submodule for pmix:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I can not see that benefit. For me this just means more maintenance effort, to make sure this also builds on darwin (where it is not even needed). Do we know if openmpi even builds with the internal pmix on Darwin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I can not see that benefit. For me this just means more maintenance effort, to make sure this also builds on darwin.
You are making sure it also builds fine on Darwin when you are building openmpi
on Darwin, because you are building it from the submodule.
... (where it is not even needed). Do we know if openmpi even builds with the internal pmix on Darwin?
You know it builds with both, because they are the same - up to minor mismatches between the pmix
release we build, and the one that was used as the submodule when openmpi
was released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know it builds with both, because they are the same
What I actually meant is: does openmpi build the internal pmix if no --with-pmix
is specified or is it simply disabled on darwin per default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know it builds with both, because they are the same
What I actually meant is: does openmpi build the internal pmix if no
--with-pmix
is specified or is it simply disabled on darwin per default?
I suspect that if --with-pmix
is not specified, they simply use the bundled pmix. Perhaps @nim65s could share with us the build log and we will be more sure.
@nim65s thanks for checking. I fixed the pmix config parameters but the |
Also, very annoyingly, |
Now, in postInstall:
|
Thanks for reporting! Unfortunately, it will be hard to solve this remotely... Here's what I would have done, if I had a Darwin machine: Build diff --git i/pkgs/development/libraries/openmpi/default.nix w/pkgs/development/libraries/openmpi/default.nix
index 4af4c475ea62..93d8683fcca5 100644
--- i/pkgs/development/libraries/openmpi/default.nix
+++ w/pkgs/development/libraries/openmpi/default.nix
@@ -218,10 +218,13 @@ stdenv.mkDerivation (finalAttrs: {
(lib.mapCartesianProduct (
{ part1, part2 }:
''
- substituteInPlace "''${!outputDev}/share/openmpi/${part1}${part2}-wrapper-data.txt" \
- --replace-fail \
- compiler=${lib.elemAt wrapperDataSubstitutions.${part2} 0} \
- compiler=${lib.elemAt wrapperDataSubstitutions.${part2} 1}
+ if [[ -f "''${!outputDev}/share/openmpi/${part1}${part2}-wrapper-data.txt" ]]; then
+ echo @@@@ "''${!outputDev}/share/openmpi/${part1}${part2}-wrapper-data.txt"
+ cat "''${!outputDev}/share/openmpi/${part1}${part2}-wrapper-data.txt"
+ else
+ echo @@@@ file does not exist \
+ "''${!outputDev}/share/openmpi/${part1}${part2}-wrapper-data.txt"
+ fi
''
))
(lib.concatStringsSep "\n")
And inspect the output... We probably need to change this pattern for the Darwin... Or perhaps disable this substitution for Darwin. I wonder what comes out in those files anyway. |
As is, we can run the whole |
Yea that might be what we'll do - after all this is meant for cross compilation which I barely imagine people do that even from Linux. Not only that, this is not a regular |
in lib.optionalString stdenv.isLinux | ||
'' | ||
in | ||
lib.optionalString stdenv.isLinux '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last nixfmt commit could be squashed with it's former, because the diff noise is not too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll squash once we have sorted out all issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll squash once we have sorted out all issues.
I wouldn't want to squash all commits, as the diff noise is large for the configureFlags
change... Perhaps if we'd fix pmix
on Darwin the diff noise there will be negligible.
@@ -193,7 +193,7 @@ stdenv.mkDerivation (finalAttrs: { | |||
]; | |||
part2 = builtins.attrNames wrapperDataSubstitutions; | |||
}; | |||
in | |||
in lib.optionalString stdenv.isLinux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to only condition the substitution and not all of the postInstall... We might enable multiple outputs in the future for other platforms, and you'd still want to delete all *.la
files...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need extra condition for the split output moves too. If we want to enable multiple outputs in the future for other platforms we can make it more granular later? I do not know what darwin does with .la files. I know that they are not needed on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need extra condition for the split output moves too.
No you won't. That was the purpose of 2a4636b .
If we want to enable multiple outputs in the future for other platforms we can make it more granular later?
The diff will be much smaller thanks to 2a4636b .
I do not know what darwin does with .la files. I know that they are not needed on Linux.
Yea that line was peculiar to me so I didn't touch it.. I suspect they might be generated on Darwin as well, so I wouldn't touch it even now. For sure it doesn't hurt if these files are not found at all.
c5a289b build fine on aarch64-darwin |
Description of changes
This pulled in pmix unconditionally as a dependency and effectively disabled openmpi on darwin.
Fix for #327438
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.