From c7848dedd4a729ce76999df4915e9989e03507ca Mon Sep 17 00:00:00 2001 From: James Lamb Date: Mon, 16 Dec 2024 11:51:56 -0600 Subject: [PATCH] document changes in library-loading, update build-UCX-from-source docs (#1099) Contributes to https://github.com/rapidsai/build-planning/issues/118 Caused by https://github.com/rapidsai/ucx-wheels/pull/13 I originally came here to document the implications of https://github.com/rapidsai/ucx-wheels/pull/13 in the docs, namely: * if you have a `libucx-cu{11,12}` wheel installed, then by default `ucx-py` will use UCX libraries from that wheel * environment variable `RAPIDS_LIBUCX_PREFER_SYTEM_LIBRARY=true` can be set to opt out of this and use a system installation instead While doing that, I noticed some other opportunities for improvement in the installation docs: * updating build-UCX-from-source instructions to UCX 1.15 ([the oldest version this project now supports](https://github.com/rapidsai/ucx-py/blob/9efacc6069226de8e207177a359189f8880203a8/dependencies.yaml#L159)) * clarifying and simplifying some language ## Notes for Reviewers ### How I tested this Followed these instructions in a Docker container running on an x86_64 machine with 8 V100s. ```shell docker run \ --rm \ --gpus 0 \ -v $(pwd):/opt/work \ -w /opt/work \ -it rapidsai/ci-conda:latest \ bash ``` Used `conda` to set up the build environment: ```shell conda create -n ucx -c conda-forge \ automake make libtool pkg-config \ "python=3.12" "setuptools>=64.0" "cython>=3.0.0" \ cuda-nvcc \ cuda-cudart-dev \ cuda-nvml-dev \ cuda-nvtx-dev \ cuda-version=12.5 ``` Ran variations of this code snippet to test my install: ```shell python -c "import ucp; print(ucp.get_ucx_version())" ``` Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: https://github.com/rapidsai/ucx-py/pull/1099 --- docs/source/install.rst | 118 ++++++++++++++++++++++++++++------------ 1 file changed, 83 insertions(+), 35 deletions(-) diff --git a/docs/source/install.rst b/docs/source/install.rst index b41661de..4a59a1f0 100644 --- a/docs/source/install.rst +++ b/docs/source/install.rst @@ -34,17 +34,24 @@ support is active by checking for the presence of ``/dev/infiniband/rdma_cm`` an Conda ----- -Conda packages can be installed as so. Replace ```` with the -desired version (minimum ``11.2``). These are available both on ``rapidsai`` -and ``rapidsai-nightly``. Starting with the UCX 1.14.1 conda-forge package, -InfiniBand support is available again via rdma-core, thus building UCX -from source is not required solely for that purpose anymore but may still -be done if desired (e.g., to test for new capabilities or bug fixes). +Use one of the commands below to install conda packages. +Replace `-c rapidsai` with `-c rapidsai-nightly` to pull in the newer but less stable nightly packages. +Change `cuda-version` to pin to a different CUDA minor version if you'd like. :: + # CUDA 11 + conda create -n ucx -c conda-forge -c rapidsai \ + cuda-version=11.8 ucx-py + + # CUDA 12 conda create -n ucx -c conda-forge -c rapidsai \ - cudatoolkit= ucx-py + cuda-version=12.5 ucx-py + +Starting with the UCX 1.14.1 conda-forge package, +InfiniBand support is available again via rdma-core, thus building UCX +from source is not required solely for that purpose anymore but may still +be done if desired (e.g., to test for new capabilities or bug fixes). PyPI ---- @@ -55,13 +62,15 @@ workloads and either one can be chosen if the application doesn't use CUDA, but currently there are no pre-built CPU-only packages available, so either one of CUDA packages must be installed instead. The CUDA version is differentiated by the suffix ``-cuXY``, where ``XY`` must be -replaced with the desired CUDA version. Installing CUDA ``12`` package -can be done with the following command: +replaced with the desired CUDA version. :: + # CUDA 11 pip install ucx-py-cu12 + # CUDA 12 + pip install ucx-py-cu12 UCX-Py has no direct dependency on CUDA, but the package specifies the ``-cuXY`` prefix so that the correct ``libucx-cuXY`` package is selected. @@ -75,8 +84,7 @@ Source Conda ~~~~~ -The following instructions assume you'll be using UCX-Py on a CUDA-enabled system and is in a `Conda environment `_. - +The following instructions assume you'll be using UCX-Py on a CUDA-enabled system and using a `Conda environment `_. Build Dependencies ^^^^^^^^^^^^^^^^^^ @@ -85,7 +93,7 @@ Build Dependencies conda create -n ucx -c conda-forge \ automake make libtool pkg-config \ - "python=3.12" setuptools "cython>=3.0.0" + "python=3.12" "setuptools>=64.0" "cython>=3.0.0" .. note:: The Python version must be explicitly specified here, UCX-Py currently supports @@ -102,17 +110,17 @@ Test Dependencies dask distributed cloudpickle -UCX >= 1.11.1 +UCX >= 1.15.0 ^^^^^^^^^^^^^ -Instructions for building UCX >= 1.11.1 (minimum version supported by UCX-Py), make sure to change ``git checkout v1.11.1`` to a newer version if desired: +Instructions for building UCX >= 1.15.0 (minimum version supported by UCX-Py), make sure to change ``git checkout v1.15.0`` to a newer version if desired: :: conda activate ucx git clone https://github.com/openucx/ucx cd ucx - git checkout v1.11.1 + git checkout v1.15.0 ./autogen.sh mkdir build cd build @@ -134,13 +142,13 @@ It is possible to enable InfiniBand support via the conda-forge rdma-core packag conda install -c conda-forge c-compiler cxx-compiler gcc_linux-64=11.* rdma-core=28.* -After installing the necessary dependencies, it's now time to build UCX from source, make sure to change ``git checkout v1.11.1`` to a newer version if desired: +After installing the necessary dependencies, it's now time to build UCX from source, make sure to change ``git checkout v1.15.0`` to a newer version if desired: :: git clone https://github.com/openucx/ucx cd ucx - git checkout v1.11.1 + git checkout v1.15.0 ./autogen.sh mkdir build cd build @@ -168,7 +176,7 @@ Before continuing, first ensure MOFED 5.0 or higher is installed, for example in If MOFED drivers are not installed on the machine, you can download drivers directly from `NVIDIA `_. -Building UCX >= 1.11.1 as shown previously should automatically include InfiniBand support if available in the system. It is possible to explicitly +Building from source as shown previously should automatically include InfiniBand support if available in the system. It is possible to explicitly activate those, ensuring the system satisfies all dependencies or fail otherwise, by including the ``--with-rdmacm`` and ``--with-verbs`` build flags. Additionally, we want to make sure UCX uses compilers from the system, we do so by specifying ``CC=/usr/bin/gcc`` and ``CXX=/usr/bin/g++``, be sure to adjust that for the path to your system compilers. For example: @@ -205,7 +213,42 @@ PyPI The following instructions assume you'll be installing UCX-Py on a CUDA-enabled system, in a pip-only environment. -Installing UCX-Py from source in a pip-only environment has additional limitations when compared to conda environments. Unlike conda packages, where the ``ucx`` package is installed under the ``CONDA_PREFIX``, ``libucx`` is installed under ``site-packages`` which is normally not looked for system libraries. Therefore, you will either need UCX to be installed in the system path, or include the UCX install path in ``LD_LIBRARY_PATH``. +Installing UCX-Py from source in a pip-only environment has additional limitations when compared to conda environments. + +UCX-Py with UCX from PyPI +^^^^^^^^^^^^^^^^^^^^^^^^^ + +CUDA-enabled builds of the UCX libraries are available from PyPI, under the names ``libucx-cu{11,12}``. +Notice that those builds do not currently include InfiniBand support, if InfiniBand is required you will +need to provide a custom UCX install as described in the "UCX-Py with custom UCX install" section. + +To build UCX-Py using those UCX packages (to avoid needing to build UCX from source), run the following. + +:: + + conda activate ucx + git clone https://github.com/rapidsai/ucx-py.git + cd ucx-py + pip install -C 'rapidsai.disable-cuda=false' . + # or for develop build + pip install -v -e . + +This will automatically handle installing appropriate, compatible ``libucx-cu{11,12}`` packages for build-time and runtime use. +When you run UCX-Py code installed this way, it will load UCX libraries from the installed ``libucx-cu{11,12}`` package. + +UCX-Py packages are built against the oldest version of UCX that UCX-Py supports, and can run against a range +of ABI-compatible UCX versions. + +You can use packages from PyPI to customize the UCX version used at runtime. +For example, to switch to using UCX 1.16 at runtime, run the following. + +:: + + # CUDA 11 + pip install 'libucx-cu11>=1.16.0,<1.17' + + # CUDA 12 + pip install 'libucx-cu12>=1.16.0,<1.17' UCX-Py with UCX system install @@ -222,39 +265,44 @@ If a UCX system install is available, building and installing UCX-Py can be done # or for develop build pip install -v -e . +To ensure that system install of UCX is always used at runtime (and not the ``libucx-cu{11,12}`` wheels), set the following +environment variable in the runtime environment. -UCX-Py with custom UCX install -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +:: -If UCX is installed in a non-default path, specifying ``LD_LIBRARY_PATH`` is required both to install and run UCX-Py code. For installation, ``LD_LIBRARY_PATH`` is required to ensure UCX-Py builds against the desired version of UCX, since another incompatible UCX version may already be installed in the system. Running UCX-Py code also needs to find the proper UCX libraries at runtime, and thus ``LD_LIBRARY_PATH`` must be specified as well. + export RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true -A custom UCX install does not necessarily mean it needs to be build from source, a ``libucx-cuXY`` package may be used as well. For example, with the CUDA ``12`` package: -:: - - pip install libucx-cu12 +UCX-Py with custom UCX install +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The above will install the UCX library in your environment, specifically under ``site-packages/libucx``. To find the path to ``site-packages`` you may execute: +If UCX is installed in a non-default path (as it might be if you built it from source), some additional configuration is required to build and run UCX-Py against it. +To check if the loader can find your custom UCX installation, run the following. :: - python -c "import site; print(site.getsitepackages()[0])" - -The command above will print the path to ``site-packages``, such as ``/opt/python/site-packages``. The path to the UCX shared library installation is then ``/opt/python/site-packages/libucx/lib``, which is the value that will be specified for ``LD_LIBRARY_PATH``. If you build UCX from source and installed it in a different location, make sure you adjust the value of ``LD_LIBRARY_PATH`` accordingly, or if you built UCX from source and installed it in a path that the system will lookup for libraries by default, specifying ``LD_LIBRARY_PATH`` is unnecessary. + ldconfig -p | grep libucs -Now installing UCX-Py can be done via ``pip install``: +If that returns that filepath you expect, then you can just use the "UCX-Py with UCX system install" instructions above. +If that doesn't show anything, then you need to help the loader find the UCX libraries. +At build time, add your install of UCX to ``LD_LIBRARY_PATH``. :: conda activate ucx git clone https://github.com/rapidsai/ucx-py.git cd ucx-py - LD_LIBRARY_PATH=/opt/python/site-packages/libucx/lib pip install -v . + CUSTOM_UCX_INSTALL="wherever-you-put-your-ucx-install" + LD_LIBRARY_PATH="${CUSTOM_UCX_INSTALL}:${LD_LIBRARY_PATH}" \ + pip install -v . # or for develop build - LD_LIBRARY_PATH=/opt/python/site-packages/libucx/lib pip install -v -e . + LD_LIBRARY_PATH="${CUSTOM_UCX_INSTALL}:${LD_LIBRARY_PATH}" \ + pip install -v -e . -Now, to run UCX-Py-enabled code specifying ``LD_LIBRARY_PATH`` will also be required. For example: +Set the following in the environment to ensure that those libraries are preferred at run time as well. :: - LD_LIBRARY_PATH=/opt/python/site-packages/libucx/lib python -c "import ucp; print(ucp.get_ucx_version())" + RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true + LD_LIBRARY_PATH="${CUSTOM_UCX_INSTALL}:${LD_LIBRARY_PATH}" \ + python -c "import ucp; print(ucp.get_ucx_version())"