New container relion 4.0.1.sm61 #620

vnm-neurodesk · 2024-03-18T07:42:45Z

There is a new container by @stebo85, use this command to test:

bash /neurocommand/local/fetch_and_run.sh relion 4.0.1.sm61 20240318

If test was successful, then add to apps.json to release:
https://github.com/NeuroDesk/neurocommand/edit/main/neurodesk/apps.json

Please close this issue when completed :)

The text was updated successfully, but these errors were encountered:

stebo85 · 2024-03-18T22:15:02Z

@vennand - could you test this container and see if it all works as expected?

vennand · 2024-03-20T04:38:19Z

How do I transfer data to the neurodesktop to test? It opens as expected, but I need to launch a job to know if it'll work.

stebo85 · 2024-03-20T04:41:55Z

Are you running neurodesktop locally in docker? If yes, you have a shared directory between the desktop and the host.

Alternatively you can drag and drop files on the desktop and guacamole will upload the file (has to be one file, can't be a directory)

vennand · 2024-03-21T05:05:40Z

I'm trying locally in docker, and I just noticed the directory, thanks!

Is it possible to do a GPU passthrough with the local docker? I'm pretty sure I won't be able to test if the GPU settings work otherwise. Though so far, there was no error message saying it was CPU only.

Though I'm not convinced it compiled with GPU support if the machine that built the container didn't have a GPU. With the new version of relion (ver5.0), they explicitly state that the compiler tries to detect a GPU, and if not, compiles for CPU only, even if the a GPU architecture is provided.

stebo85 · 2024-03-21T05:09:27Z

Dear @vennand

yes, you can pass your GPU into the docker container:

sudo docker run \
  --shm-size=1gb -it --privileged --user=root --name neurodesktop \
  -v ~/neurodesktop-storage:/neurodesktop-storage \
  -e NB_UID="$(id -u)" -e NB_GID="$(id -g)" \
  --gpus all \
  -p 8888:8888 -e NEURODESKTOP_VERSION=2024-01-12 \
  vnmd/neurodesktop:2024-01-12

to check if it worked, run nvidia-smi in the desktop container afterwards

that would be annoying if it needs a GPU to compile. We do not have the ability to run a GPU node for building containers.

vennand · 2024-03-22T05:11:11Z

We might just be limited to version 4 for now then. As far as I can tell, version 5 is still in beta, so it might not be advisable to use it for research anyway.

I tried running your command, but I get the following error message
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. ERRO[0000] error waiting for container: context canceled

Didn't find anything relevant with a very quick Google. Any idea what could cause this?

Also, I won't be able to touch this until the 16th of April unfortunately, but I plan on getting back to it.

stebo85 · 2024-03-22T05:14:56Z

Dear @vennand

Did you install the nvidia-container-toolkit beforehand?

#RHEL/CentOS (yum-based)
sudo yum install nvidia-container-toolkit -y
#Ubuntu/Debian (apt-based)
sudo apt install nvidia-container-toolkit -y

vennand · 2024-03-22T05:21:30Z

I had not, but I get the same error after installing it

stebo85 · 2024-03-22T05:23:13Z

what are you getting when you run nvidia-smi on your host system?

vennand · 2024-03-22T05:23:49Z

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1536 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+`

stebo85 · 2024-03-22T05:25:56Z

can you try this? https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/

It needs a restart restart of the docker daemon and potentially apt-get install -y nvidia-docker2

vennand · 2024-03-22T05:48:34Z

I've installed nvidia-docker2, but I've also ran this
sudo nvidia-ctk runtime configure --runtime=docker

I don't know which one worked, but it worked. I'll try to test it now, but I don't know if I'll have time

vennand · 2024-05-14T03:30:50Z

Hi @stebo85,

I've finished testing. Relion works as intended, but none of the jobs showed up when running "nvidia-smi", even though we could see the GPU being used. Not sure if that's an issue with the GPU passthrough, but it is using the GPU.

Another important issue is that one of the third party software I install along with relion doesn't work. Basically, CTFFIND 4.1.14 fails if it's compiled with GCC 8 or above. The fix I've found is to modify the code, which doesn't seem practical or elegant to do in the neurodesk script.
What would be the best approach around this? Should I host a "fixed" copy of the code on my own Github? (though I'm not sure if the license agreement allows this)

stebo85 · 2024-05-16T07:29:15Z

Dear @vennand,
which command did you use for testing the GPUs? I have seen a similar behaviour once using the old flag. Can you try with --gpus all ? Another check: what comes up when you run which nvidia-smi?

Fixing a software live for a container is a tricky one. I have done various things in the past depending on the project:

apply an sed command that fixes a few single lines in the neurocontainer buildscript - would that work for you?
provide a fixed sourecode file in the neurocontainers repository along with the build script and copy it into the container during build to overwrite the upstream file
fork the software and then fix it there and use the fix inside the container + provide the fix upstream in the hope they merge it.

vennand · 2024-05-17T06:03:58Z

@stebo85

To test the GPU, I simply watched nvidia-smi (watch -n 1 nvidia-smi) while running relion. Relion launches python scripts that show there normally. They didn't in the VM, but they were listed on the main machine (the one I'm running neurodesk from).

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      Off | 00000000:01:00.0 Off |                  Off |
| N/A   31C    P0              74W / 250W |  24256MiB / 24576MiB |     67%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1632      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      7835      C   ...relion-4.0.1.sm61/bin/relion_refine    24250MiB |
+---------------------------------------------------------------------------------------+

I don't know exactly how the code accesses the GPU, but I can probably find out if that's relevant.

When I run which nvidia-smi I get /usr/bin/nvidia-smi

Regarding fixing the software, I think I'll go with option 2, since the source code is only 11MB. Do you want me to push the fix now, or should we investigate the GPU "issue" before?

stebo85 · 2024-05-21T14:01:58Z

Interesting. I don't know what causes this behaviour, but I guess if it works it works no matter where the GPU tasks show up.

Happy for you to push the fix now :) Let's see if we can get this work!

vennand · 2024-05-27T07:14:21Z

@stebo85 Would you know what this error means?

$ bash build.sh -ds
Entering Debug mode
WARNING: Skipping neurodocker as it is not installed.
Defaulting to user installation because normal site-packages is not writeable
Collecting https://github.com/ReproNim/neurodocker/tarball/master
  Downloading https://github.com/ReproNim/neurodocker/tarball/master
     - 77.3 kB 10.0 MB/s 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [36 lines of output]
      /tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning:
      ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x

      Your build configuration is incomplete and previously worked by accident!
      setuptools_scm requires setuptools>=61

      Suggested workaround if applicable:
       - migrating from the deprecated setup_requires mechanism to pep517/518
         and using a pyproject.toml to declare build dependencies
         which are reliably pre-installed before running the build tools

        warnings.warn(
      Traceback (most recent call last):
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
        File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/build.py", line 112, in prepare_metadata_for_build_wheel
          directory = os.path.join(metadata_directory, f'{builder.artifact_project_id}.dist-info')
        File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/builders/wheel.py", line 825, in artifact_project_id
          self.project_id
        File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/builders/plugin/interface.py", line 374, in project_id
          self.__project_id = f'{self.normalize_file_name_component(self.metadata.core.name)}-{self.metadata.version}'
        File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 149, in version
          self._version = self._get_version()
        File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 248, in _get_version
          version = self.hatch.version.cached
        File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 1466, in cached
          raise type(e)(message) from None
      LookupError: Error getting the version from source `vcs`: setuptools-scm was unable to detect version for /tmp/pip-req-build-wu94yd8o.

      Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

      For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

stebo85 · 2024-05-27T07:26:39Z

Yes, you need to update the GitHub url of neurodocker: For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj [end of output] Thank you Steffen

…

-- W: https://mri.sbollmann.net<https://mri.sbollmann.net/> | W: https://www.neurodesk.org<https://www.neurodesk.org/> | T: https://twitter.com/sbollmann_MRI | G: https://github.com/stebo85 Book meeting: https://calendly.com/s-bollmann/meeting

________________________________ From: Andre Venne ***@***.***> Sent: Monday, May 27, 2024 9:15:30 AM To: NeuroDesk/neurocontainers ***@***.***> Cc: Steffen Bollmann ***@***.***>; Mention ***@***.***> Subject: Re: [NeuroDesk/neurocontainers] New container relion 4.0.1.sm61 (Issue #620) @stebo85<https://github.com/stebo85> Would you know what this error means? $ bash build.sh -ds Entering Debug mode WARNING: Skipping neurodocker as it is not installed. Defaulting to user installation because normal site-packages is not writeable Collecting https://github.com/ReproNim/neurodocker/tarball/master Downloading https://github.com/ReproNim/neurodocker/tarball/master - 77.3 kB 10.0 MB/s 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error × Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [36 lines of output] /tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning: ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x Your build configuration is incomplete and previously worked by accident! setuptools_scm requires setuptools>=61 Suggested workaround if applicable: - migrating from the deprecated setup_requires mechanism to pep517/518 and using a pyproject.toml to declare build dependencies which are reliably pre-installed before running the build tools warnings.warn( Traceback (most recent call last): File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module> main() File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel return hook(metadata_directory, config_settings) File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/build.py", line 112, in prepare_metadata_for_build_wheel directory = os.path.join(metadata_directory, f'{builder.artifact_project_id}.dist-info') File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/builders/wheel.py", line 825, in artifact_project_id self.project_id File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/builders/plugin/interface.py", line 374, in project_id self.__project_id = f'{self.normalize_file_name_component(self.metadata.core.name)}-{self.metadata.version}' File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 149, in version self._version = self._get_version() File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 248, in _get_version version = self.hatch.version.cached File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 1466, in cached raise type(e)(message) from None LookupError: Error getting the version from source `vcs`: setuptools-scm was unable to detect version for /tmp/pip-req-build-wu94yd8o. Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work. For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. — Reply to this email directly, view it on GitHub<#620 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA6V2W2Z6DSY4RYJNEGIPO3ZELMRFAVCNFSM6AAAAABE3BMHXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSG44TSMRSGQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

vennand · 2024-06-17T07:27:15Z

@stebo85

Hey, I'm back working on this. I'll start implementing the other software soon.

But first, I tested this version of relion on our other GPUs, and it runs without issues. Perhaps the default setting (sm35) is too old, but this one works. I'm thinking it would be simpler for users to only package this one.
If you think this could be a good idea, how do we go about this? Only put this one in the JSON, with Exec: relion?

stebo85 · 2024-06-18T08:27:23Z

Great to hear that Relion is working :)

ok, makes sense that the newer version works better. CUDA is usually quite backwards compatible, so if you have fairly current driver versions that makes sense.

Yes, put the version you found working best in the apps.json and this will trigger the release process.

Thank you for getting this to work !!!

vnm-neurodesk added this to NeuroDesk Mar 18, 2024

github-project-automation bot moved this to New in NeuroDesk Mar 18, 2024

stebo85 closed this as completed Aug 21, 2024

github-project-automation bot moved this from New to Completed in NeuroDesk Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New container relion 4.0.1.sm61 #620

New container relion 4.0.1.sm61 #620

vnm-neurodesk commented Mar 18, 2024

stebo85 commented Mar 18, 2024

vennand commented Mar 20, 2024

stebo85 commented Mar 20, 2024

vennand commented Mar 21, 2024

stebo85 commented Mar 21, 2024

vennand commented Mar 22, 2024 •

edited

Loading

stebo85 commented Mar 22, 2024

vennand commented Mar 22, 2024

stebo85 commented Mar 22, 2024

vennand commented Mar 22, 2024

stebo85 commented Mar 22, 2024

vennand commented Mar 22, 2024

vennand commented May 14, 2024

stebo85 commented May 16, 2024

vennand commented May 17, 2024 •

edited

Loading

stebo85 commented May 21, 2024

vennand commented May 27, 2024

stebo85 commented May 27, 2024 via email

vennand commented Jun 17, 2024

stebo85 commented Jun 18, 2024

New container relion 4.0.1.sm61 #620

New container relion 4.0.1.sm61 #620

Comments

vnm-neurodesk commented Mar 18, 2024

stebo85 commented Mar 18, 2024

vennand commented Mar 20, 2024

stebo85 commented Mar 20, 2024

vennand commented Mar 21, 2024

stebo85 commented Mar 21, 2024

vennand commented Mar 22, 2024 • edited Loading

stebo85 commented Mar 22, 2024

vennand commented Mar 22, 2024

stebo85 commented Mar 22, 2024

vennand commented Mar 22, 2024

stebo85 commented Mar 22, 2024

vennand commented Mar 22, 2024

vennand commented May 14, 2024

stebo85 commented May 16, 2024

vennand commented May 17, 2024 • edited Loading

stebo85 commented May 21, 2024

vennand commented May 27, 2024

stebo85 commented May 27, 2024 via email

vennand commented Jun 17, 2024

stebo85 commented Jun 18, 2024

vennand commented Mar 22, 2024 •

edited

Loading

vennand commented May 17, 2024 •

edited

Loading