Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[foss/2021a] RStudio-Server V1.4.1717-Java-11-R V4.1.0 #299

Conversation

TopRichard
Copy link
Collaborator

No description provided.

@eessi-bot
Copy link

eessi-bot bot commented Jul 6, 2023

Instance eessi-bot-citc-aws is configured to build:

  • arch x86_64/generic for repo eessi-2021.12
  • arch x86_64/generic for repo eessi-2023.06-compat
  • arch x86_64/generic for repo eessi-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-2021.12
  • arch x86_64/intel/haswell for repo eessi-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-2021.12
  • arch x86_64/intel/skylake_avx512 for repo eessi-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-2021.12
  • arch x86_64/amd/zen2 for repo eessi-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-2021.12
  • arch x86_64/amd/zen3 for repo eessi-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-2023.06-software
  • arch aarch64/generic for repo eessi-2021.12
  • arch aarch64/generic for repo eessi-2023.06-compat
  • arch aarch64/generic for repo eessi-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-2021.12
  • arch aarch64/neoverse_n1 for repo eessi-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-2021.12
  • arch aarch64/neoverse_v1 for repo eessi-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-2023.06-software

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi-2023.06-software arch:x86_64/generic

@eessi-bot
Copy link

eessi-bot bot commented Jul 6, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi-2023.06-software architecture:x86_64/generic resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Jul 6, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-generic for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.07/pr_299/5768

date job status comment
Jul 06 07:16:01 UTC 2023 submitted job id 5768 awaits release by job manager
Jul 06 07:16:19 UTC 2023 released job awaits launch by Slurm scheduler
Jul 06 07:20:28 UTC 2023 running job 5768 is running
Jul 06 07:55:49 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-5768.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.

@ocaisa
Copy link
Member

ocaisa commented Jul 6, 2023

Not sure this one is a good idea, easybuilders/easybuild-easyconfigs#13951
I had a similar PR open that had a lot more going on, easybuilders/easybuild-easyconfigs#15524

@boegel
Copy link
Contributor

boegel commented Jul 6, 2023

@TopRichard For Java, you'll probably need to use easybuilders/easybuild-easyblocks#2557 via --include-easyblocks-from-pr

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi-2023.06-software arch:x86_64/generic

@eessi-bot
Copy link

eessi-bot bot commented Jul 6, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi-2023.06-software architecture:x86_64/generic resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Jul 6, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-generic for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.07/pr_299/5770

date job status comment
Jul 06 08:02:20 UTC 2023 submitted job id 5770 awaits release by job manager
Jul 06 08:02:59 UTC 2023 released job awaits launch by Slurm scheduler
Jul 06 08:04:04 UTC 2023 running job 5770 is running
Jul 06 08:32:02 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-5770.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.

…o eessi-2023.06-RStudio-Server/1.4.1717-Java/11-R-4.1.0.eb-foss/2021a
@bedroge
Copy link
Collaborator

bedroge commented Jul 10, 2023

Just checked the log of this build, and it failed during the build of Xvfb/1.20.11-GCCcore-10.3.0:

== postprocessing...
  >> running command:
        [started at: 2023-07-06 08:29:25]
        [working dir: /tmp/bot/easybuild/build/Xvfb/1.20.11/GCCcore-10.3.0/xorg-server-1.20.11]
        [output logged in /tmp/eb-akzyf_n9/eb-1_93q_0q/eb-lzt2gn2z/eb-9xo5zmng/eb-u5dyuu1k/eb-0vncxrhv/eb-l_ls80gm/eb-i4m2bxtz/eb-oq4jykos/eb-4ni3tfkx/easybuild-run_cmd-aqpwi3rn.log]
        cp -a xvfb-run /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/ && chmod u+x  /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/xvfb-run
  >> command completed: exit 1, ran in < 1s
== ... (took < 1 sec)
== FAILED: Installation ended unsuccessfully (build directory: /tmp/bot/easybuild/build/Xvfb/1.20.11/GCCcore-10.3.0): build failed (first 300 chars): cmd "cp -a xvfb-run /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/ && chmod u+x  /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/x86_64/generic/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/xvfb-run" exited with exit code 1 a (took 5 mins 3 secs)
== Results of the build can be found in the log file(s) /tmp/eb-akzyf_n9/eb-1_93q_0q/eb-lzt2gn2z/eb-9xo5zmng/eb-u5dyuu1k/eb-0vncxrhv/eb-l_ls80gm/eb-i4m2bxtz/eb-oq4jykos/eb-4ni3tfkx/easybuild-Xvfb-1.20.11-20230706.082422.yCixV.log

It's not really clear to me why that cp or chmod command failed. Do note the weird nested structure in the path of that log file, which is caused by easybuilders/easybuild-framework#4291. Not sure of that's causing the issue here, though.

@bedroge
Copy link
Collaborator

bedroge commented Jul 11, 2023

I debugged it a bit more by manually building this on an AWS node, and then I indeed ran into the same error:

Apptainer> cp -a xvfb-run /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/
cp: preserving permissions for '/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/xvfb-run': Permission denied

It turns out that it's a combination of -a and the file lacking write permission for the user:

Apptainer> ls -l xvfb-run
-r-xr-xr-x. 1 bedroge users 5834 Jun 20 19:39 xvfb-run

Removing the -a option or adding write permission both solve the issue:

Apptainer> cp xvfb-run /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/
Apptainer> rm  /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/xvfb-run
rm: remove write-protected regular file '/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/xvfb-run'? y
Apptainer> 
Apptainer> chmod u+w xvfb-run 
Apptainer> cp -a xvfb-run /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Xvfb/1.20.11-GCCcore-10.3.0/bin/
Apptainer> 

Just don't understand why this wasn't an issue before...

@bedroge
Copy link
Collaborator

bedroge commented Jul 11, 2023

And digging a bit more... initially I suspected that it may be related to the version of fuse-overlayfs, which may be partly true. The old version (0.3) that we used before, does seem to work fine for files without write permission.

However, the main cause is probably that the file xvfb-run is part of the Easybuild installation (it's actually a patch for Xvfb, see https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/x/Xvfb/xvfb-run and https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/x/Xvfb/Xvfb-1.20.11-GCCcore-10.3.0.eb#L86), and somehow the file got these weird permission for our installation. It doesn't have these permissions in the Easybuild source tarball, and neither do I see them for any other versions. So I suspect something weird happened for our Easybuild installation, either in the build or in the tarball creation (it's not the ingestion, as the permissions are really like this in the original tarball).

I'm going to try and rebuild EasyBuild 4.7.2 and see if the same thing happens again.

@bedroge
Copy link
Collaborator

bedroge commented Jul 11, 2023

Ah, found it, those "weird" permissions are actually enforced by the EB setting that makes all installation directories read-only, which was added by @ocaisa in
#245.

@ocaisa
Copy link
Member

ocaisa commented Jul 11, 2023

TBH I have also found that read-only installation led to a problem with an R extension installation that didn't expect a template file to be read-only (and of course it wouldn't normally be).

@bedroge
Copy link
Collaborator

bedroge commented Jul 11, 2023

I'm not sure if it makes sense to change this upstream, by chaing the postinstallcmds in the easyconfig from:

postinstallcmds = ["cp -a xvfb-run %(installdir)s/bin/ && chmod u+x  %(installdir)s/bin/xvfb-run"]

to something like:

postinstallcmds = ["chmod u+wx xvfb-run && cp -a xvfb-run %(installdir)s/bin/"]

but that does solve the issue. This can also be easily achieved by using a hook, so perhaps that's the quickest solution for now anyway. I've just tried the following, and that worked fine:

def parse_hook_xvfb_cp_permissions(ec, eprefix):
    """Add user write permission to xvfb-run before copying it in the postinstallcmds."""
    if ec.name == 'Xvfb':
        ec['postinstallcmds'] = ["chmod u+wx xvfb-run && cp -a xvfb-run %(installdir)s/bin/"]
        print_msg("Using custom postinstallcmds for %s: %s", ec.name, ec['postinstallcmds'])
    else:
        raise EasyBuildError("Xvfb-specific hook triggered for non-Xvfb easyconfig?!")

# and add it to the dict at the bottom:

PARSE_HOOKS = {
    ...
    'Xvfb': parse_hook_xvfb_cp_permissions,
}

TopRichard added 2 commits July 12, 2023 05:26
…o eessi-2023.06-RStudio-Server/1.4.1717-Java/11-R-4.1.0.eb-foss/2021a
@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi-2023.06-software arch:x86_64/generic

@eessi-bot
Copy link

eessi-bot bot commented Jul 12, 2023

Updates by the bot instance eessi-bot-citc-aws (click for details)
  • received bot command build repo:eessi-2023.06-software arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi-2023.06-software architecture:x86_64/generic resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Jul 12, 2023

New job on instance eessi-bot-citc-aws for architecture x86_64-generic for repository eessi-2023.06-software in job dir /mnt/shared/home/bot/eessi-bot-software-layer/jobs/2023.07/pr_299/5804

date job status comment
Jul 12 05:31:52 UTC 2023 submitted job id 5804 awaits release by job manager
Jul 12 05:32:30 UTC 2023 released job awaits launch by Slurm scheduler
Jul 12 05:36:33 UTC 2023 running job 5804 is running
Jul 12 07:41:17 UTC 2023 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-5804.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.

@bedroge
Copy link
Collaborator

bedroge commented Jul 12, 2023

checking libcurl version ... 8.1.2
checking curl/curl.h usability... yes
checking curl/curl.h presence... yes
checking for curl/curl.h... yes
checking if libcurl is version 7 and >= 7.28.0... no
configure: error: libcurl >= 7.28.0 library and headers are required with support for https

Looks like the curl from our compat layer is too new...

def parse_hook_xvfb_cp_permissions(ec, eprefix):
"""Add user write permission to xvfb-run before copying it in the postinstallcmds."""
if ec.name == 'Xvfb':
ec['postinstallcmds'] = ["chmod u+wx xvfb-run && cp -a xvfb-run %(installdir)s/bin/"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TopRichard Can you clarify why this is needed?

If there's a problem here, I'm not sure why we didn't see it outside of EESSI...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel Xvfb is a dependency, and so it was set to avoid a failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that part, but I don't see why we would see this problem with Xvfb outside of EESSI.

I'm seeing it too in #328 where I'm trying to get R installed (just to break down this PR a bit), so I'll take a closer look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel I debugged that issue some time ago, see #299 (comment) and the messages below that one. TLDR (if I remember correctly): it's a combination of our writable overlay + setting the installation dir as read-only (and xvfb-run is shipped as "patch" in the EB installation dir).

Copy link
Collaborator

@laraPPr laraPPr Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also running into the same issue with #335

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel Could the issue with Xvfb be related to the version of XZ? That is the only difference I can discern in EESSI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's (I believe) somehow related to the fact we now have write-only installations and the writable overlay. We should probably create an upstream PR to reverse the order of statements in the Xvfb postinstallcmds, which is what the hook update here is doing (or we just move forward with the hook update here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand this a bit better now. The cp -a xvfb-run fails in our particular build environment in which we use fuse-overlayfs (but not in a regular build environment when the file is just copied from one filesystem to another) because of the read-only permissions on the EasyBuild installation (where xvfb-run comes from).

I strongly prefer fixing this centrally in EasyBuild, so I've opened a pull request for it: easybuilders/easybuild-easyconfigs#18834

I'm also fixing the chmod u+x, which should actually be chmod a+x, so exec permissions on xvfb-run in the installation directory are set for all users, not just for the account used to perform the installation.

@laraPPr
Copy link
Collaborator

laraPPr commented Sep 19, 2023

With the following pr, https://github.com/easybuilders/easybuild-easyconfigs/pull/18834/files. The installation of Xvfb works.

@TopRichard TopRichard closed this Oct 10, 2023
@TopRichard TopRichard deleted the eessi-2023.06-RStudio-Server/1.4.1717-Java/11-R-4.1.0.eb-foss/2021a branch December 20, 2023 19:16
trz42 pushed a commit to trz42/software-layer that referenced this pull request Apr 6, 2024
improve error reporting when EasyBuild PR is not merged
- CI still fails even after using an older apptainer version (via EESSI#303)
- opened a PR for debugging issue (maybe it's related to NESSI or to the environment that runs the CI workflow ... usually the same script works flawlessly when building software)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants