Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefer system install of UCX in pip devcontainers, update outdated RAPIDS references #6149

Merged
merged 6 commits into from
Dec 11, 2024

Conversation

jameslamb
Copy link
Member

Contributes to rapidsai/build-planning#118

Proposes the following changes for pip devcontainers:

And some other related changes noticed while doing that:

  • update lingering 24.* references to 25.02

Notes for Reviewers

How I tested this

Relying on CI for most things. Double-checked that update-version.sh would have caught the one lingering 24.12 reference like this:

./ci/release/update-version.sh '25.02.00'
git grep -E '24\.'

@jameslamb jameslamb changed the title WIP: prefer system install of UCX in pip devcontainers, update outdated RAPIDS references prefer system install of UCX in pip devcontainers, update outdated RAPIDS references Dec 2, 2024
@jameslamb jameslamb marked this pull request as ready for review December 2, 2024 20:05
@jameslamb jameslamb requested review from a team as code owners December 2, 2024 20:05
@jameslamb jameslamb requested a review from msarahan December 2, 2024 20:05
rapids-bot bot pushed a commit to rapidsai/cuvs that referenced this pull request Dec 2, 2024
Contributes to rapidsai/build-planning#118

Proposes the following changes for pip devcontainers:

* prefer system installation of ucx to the one provided by the `libucx-cu{11,12}` wheels (ref: rapidsai/devcontainers#421 (comment))

And some other related changes noticed while doing that:

* update lingering `24.*` references to `25.02`

## Notes for Reviewers

### How I tested this

Relying on CI for most things. Double-checked that `update-version.sh` would have caught the one lingering `24.12` reference like this:

```shell
./ci/release/update-version.sh '25.02.00'
git grep -E '24\.'
```

Similar to rapidsai/cuml#6149

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #501
@betatim
Copy link
Member

betatim commented Dec 4, 2024

I read rapidsai/build-planning#118 to find out why preferring the system library over the one from a wheel is "a good idea". But that issue doesn't really explain it, except it mentions that for most users who don't care about packaging the plan is to use the library from the wheel. Which makes me think "shouldn't we be doing that as devs as well?" - kinda dogfooding, kinda making sure this is the most comfortable path, etc

I think the change itself is uncontroversial (the magic of using the env variable once it is set is presumably hidden away somewhere and tested?), so for me it is about understanding "why make this change".

@betatim
Copy link
Member

betatim commented Dec 4, 2024

It looks like the failing tests are all related to dask. I know Dante and friends were talking about there being some issue with dask, so maybe we wait and see? Or someone who knows can chime in :D

@jameslamb
Copy link
Member Author

Thanks for the thorough review @betatim . Let me try to address these.

why preferring the system library over the one from a wheel is "a good idea". But that issue doesn't really explain it,

The main goal of rapidsai/build-planning#118 is to change our library-loading logic across RAPIDS such that if you installed wheels, at runtime by default you get the wheel-provided library, NOT any system-provided one that you might have laying around).

Doing this makes it safer to use pip install cuml-cu12, because you can use pip to manage the UCX dependency. Like pip install cuml-cu12 'libucx>=1.15.0,<1.16'. That's helpful for cases where, for example, you have permissions to pip install things but not to change system libraries (like apt-get install, yum install, etc.).

In RAPIDS devcontainers, we want to prefer the system installation because we're producing specific images for different UCX versions. See the ucx1.17.0 in this tag:

"BASE": "rapidsai/devcontainers:24.12-cpp-cuda12.5-ucx1.17.0-openmpi-ubuntu22.04"

And for other reasons related to how the devcontainers are designed (see next response below).

shouldn't we be doing that as devs as well?" - kinda dogfooding

Yes there would be some benefits to just using the libucx-cu{11,12} wheels to provide the UCX libraries in the RAPIDS devcontainers, but there are other tradeoffs there. For example, the devcontainers are designed to make it easy to do C++-only development (e.g. compiling libcuml++ and testing it) without needing to involve Python or pip at all. In that case, "get the UCX libraries from wheels" might be challenging.

But that design discussion is not specific just to UCX, and is bigger than this series of PRs. That longer-term discussion is happening here: rapidsai/build-planning#119

the magic of using the env variable once it is set is presumably hidden away somewhere and tested

Yes. For the specific case of UCX (libucx-cu{11,12} wheels):

@betatim
Copy link
Member

betatim commented Dec 5, 2024

Thanks a lot for taking the time to explain. Sounds reasonable to me.

@dantegd any thoughts on the failing (dask) tests? Should we wait for that to get resolved before merging this?

@jameslamb
Copy link
Member Author

I just merged latest branch-25.02 in here to kick off a full new CI run, to see if that'd help resolve the issues we were seeing... and now I see something different.

All tests and the docs builds are failing with this runtime error:

ConftestImportFailure: ModuleNotFoundError: No module named 'cuda.bindings' (from /__w/cuml/cuml/python/cuml/cuml/tests/conftest.py)
full stacktrace (click me)
/opt/conda/envs/test/lib/python3.10/site-packages/_pytest/config/__init__.py:331: PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: helpconfig, Hook: pytest_cmdline_parse
ConftestImportFailure: ModuleNotFoundError: No module named 'cuda.bindings' (from /__w/cuml/cuml/python/cuml/cuml/tests/conftest.py)
For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
  config = pluginmanager.hook.pytest_cmdline_parse(
ImportError while loading conftest '/__w/cuml/cuml/python/cuml/cuml/tests/conftest.py'.
../conftest.py:17: in <module>
    from cuml.testing.utils import create_synthetic_dataset
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/__init__.py:17: in <module>
    from cuml.internals.base import Base, UniversalBase
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/__init__.py:18: in <module>
    from cuml.internals.base_helpers import BaseMetaClass, _tags_class_and_instance
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/base_helpers.py:20: in <module>
    from cuml.internals.api_decorators import (
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/api_decorators.py:24: in <module>
    from cuml.internals import input_utils as iu
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/input_utils.py:20: in <module>
    from cuml.internals.array import CumlArray
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/array.py:21: in <module>
    from cuml.internals.global_settings import GlobalSettings
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/global_settings.py:20: in <module>
    from cuml.internals.device_type import DeviceType
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/device_type.py:19: in <module>
    from cuml.internals.mem_type import MemoryType
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/mem_type.py:22: in <module>
    cudf = gpu_only_import("cudf")
/opt/conda/envs/test/lib/python3.10/site-packages/cuml/internals/safe_imports.py:362: in gpu_only_import
    return importlib.import_module(module)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/__init__.py:19: in <module>
    _setup_numba()
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/utils/_numba.py:121: in _setup_numba
    shim_ptx_cuda_version = _get_cuda_build_version()
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/utils/_numba.py:16: in _get_cuda_build_version
    from cudf._lib import strings_udf
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/_lib/__init__.py:4: in <module>
    from . import (
copying.pyx:1: in init cudf._lib.copying
    ???
column.pyx:1: in init cudf._lib.column
    ???
/opt/conda/envs/test/lib/python3.10/site-packages/rmm/__init__.py:17: in <module>
    from rmm import mr
/opt/conda/envs/test/lib/python3.10/site-packages/rmm/mr.py:14: in <module>
    from rmm.pylibrmm.memory_resource import (
/opt/conda/envs/test/lib/python3.10/site-packages/rmm/pylibrmm/__init__.py:15: in <module>
    from .device_buffer import DeviceBuffer
device_buffer.pyx:1: in init rmm.pylibrmm.device_buffer
    ???
E   ModuleNotFoundError: No module named 'cuda.bindings'

(build link)

Those jobs are getting cuda-python 11.8.3

cuda-python               11.8.3          py310hda4ad70_3    conda-forge

Seeing cudf and rmm in the stacktrace is making me think the root cause is that cuml didn't yet get updates like these:

That should be resolved by merging #6157, which is blocked by the previously-mentioned Dask issues.

@jameslamb
Copy link
Member Author

That should be resolved by merging #6157, which is blocked by the previously-mentioned Dask issues.

It's now #6170 that has CI fixes in it, and that needs to be merged before we can merge this and other PRs targeting 25.02.

@dantegd
Copy link
Member

dantegd commented Dec 10, 2024

@jameslamb unfortunately there's still a devcontainer pip issue that we are trying to solve in #6170, but the cuml dask failures should be green now though

@bdice
Copy link
Contributor

bdice commented Dec 11, 2024

/merge

@rapids-bot rapids-bot bot merged commit 052cdde into rapidsai:branch-25.02 Dec 11, 2024
51 of 52 checks passed
@jameslamb jameslamb deleted the libucx-in-devcontainers branch December 11, 2024 15:31
@jameslamb
Copy link
Member Author

Thanks for coming back and updating this @bdice !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants