-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update fmt (to 11.0.2) and spdlog (to 1.14.1), add those libraries to libcuml conda host dependencies #6071
Changes from 14 commits
9981da1
67fbab3
6da7103
2a286c1
9687611
1ca0b5f
721aa88
557fca6
8ca0180
c7b3321
0654919
a1d76d4
5b812f0
7724113
2024f9e
af19c93
5382685
9d3de8d
c587389
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
#!/bin/bash | ||
# Copyright (c) 2024, NVIDIA CORPORATION. | ||
|
||
LIBRMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1678 cpp) | ||
RMM_CHANNEL=$(rapids-get-pr-conda-artifact rmm 1678 python) | ||
|
||
CUDF_CPP_CHANNEL=$(rapids-get-pr-conda-artifact cudf 16806 cpp) | ||
CUDF_PYTHON_CHANNEL=$(rapids-get-pr-conda-artifact cudf 16806 python) | ||
|
||
UCXX_CHANNEL=$(rapids-get-pr-conda-artifact ucxx 278 cpp) | ||
|
||
LIBRAFT_CHANNEL=$(rapids-get-pr-conda-artifact raft 2433 cpp) | ||
RAFT_CHANNEL=$(rapids-get-pr-conda-artifact raft 2433 python) | ||
|
||
# NOTE: cloning private repos with rapids-get-pr-conda-artifact doesn't work, | ||
# so need to explicitly set the SHA to use | ||
CUMLPRIMS_CHANNEL=$( | ||
RAPIDS_SHA=6f9f474 rapids-get-pr-conda-artifact cumlprims_mg 211 cpp 6f9f474 | ||
) | ||
|
||
conda config --system --add channels "${LIBRMM_CHANNEL}" | ||
conda config --system --add channels "${RMM_CHANNEL}" | ||
conda config --system --add channels "${CUDF_CPP_CHANNEL}" | ||
conda config --system --add channels "${CUDF_PYTHON_CHANNEL}" | ||
conda config --system --add channels "${UCXX_CHANNEL}" | ||
conda config --system --add channels "${LIBRAFT_CHANNEL}" | ||
conda config --system --add channels "${RAFT_CHANNEL}" | ||
conda config --system --add channels "${CUMLPRIMS_CHANNEL}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
#!/bin/bash | ||
# Copyright (c) 2024, NVIDIA CORPORATION. | ||
|
||
RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})" | ||
|
||
LIBRMM_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=rmm_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact rmm 1678 cpp | ||
) | ||
RMM_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=rmm_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact rmm 1678 python | ||
) | ||
|
||
UCXX_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=ucxx_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact ucxx 278 python | ||
) | ||
LIBUCXX_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=libucxx_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact ucxx 278 cpp | ||
) | ||
DISTRIBUTED_UCXX_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=distributed_ucxx_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact ucxx 278 python | ||
) | ||
|
||
CUDF_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=cudf_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact cudf 16806 python | ||
) | ||
LIBCUDF_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=libcudf_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact cudf 16806 cpp | ||
) | ||
PYLIBCUDF_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=pylibcudf_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact cudf 16806 python | ||
) | ||
DASK_CUDF_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=dask_cudf_${RAPIDS_PY_CUDA_SUFFIX} \ | ||
RAPIDS_PY_WHEEL_PURE=1 \ | ||
rapids-get-pr-wheel-artifact cudf 16806 python | ||
) | ||
|
||
RAFT_DASK_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=raft_dask_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact raft 2433 python | ||
) | ||
PYLIBRAFT_CHANNEL=$( | ||
RAPIDS_PY_WHEEL_NAME=pylibraft_${RAPIDS_PY_CUDA_SUFFIX} rapids-get-pr-wheel-artifact raft 2433 python | ||
) | ||
|
||
cat > /tmp/constraints.txt <<EOF | ||
librmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBRMM_CHANNEL}/librmm_*.whl) | ||
rmm-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${RMM_CHANNEL}/rmm_*.whl) | ||
cudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${CUDF_CHANNEL}/cudf_*.whl) | ||
libcudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBCUDF_CHANNEL}/libcudf_*.whl) | ||
pylibcudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${PYLIBCUDF_CHANNEL}/pylibcudf_*.whl) | ||
dask-cudf-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${DASK_CUDF_CHANNEL}/dask_cudf_*.whl) | ||
ucxx-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${UCXX_CHANNEL}/ucxx_*.whl) | ||
libucxx-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${LIBUCXX_CHANNEL}/libucxx_*.whl) | ||
distributed-ucxx-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${DISTRIBUTED_UCXX_CHANNEL}/distributed_ucxx_*.whl) | ||
raft-dask-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${RAFT_DASK_CHANNEL}/raft_dask_*.whl) | ||
pylibraft-${RAPIDS_PY_CUDA_SUFFIX} @ file://$(echo ${PYLIBRAFT_CHANNEL}/pylibraft_*.whl) | ||
EOF | ||
|
||
export PIP_CONSTRAINT=/tmp/constraints.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,10 +68,12 @@ requirements: | |
- libcusolver-dev | ||
- libcusparse-dev | ||
{% endif %} | ||
- fmt {{ fmt_version }} | ||
- libcumlprims ={{ minor_version }} | ||
- libraft ={{ minor_version }} | ||
- libraft-headers ={{ minor_version }} | ||
- librmm ={{ minor_version }} | ||
- spdlog {{ spdlog_version }} | ||
Comment on lines
+71
to
+76
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are in the global list for the split recipe but also need to be in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah ok! I'm not that familiar with the ways that split recipes are different from single-package ones, thank you for pointing this out. I'm not sure precisely what you mean about those two choices... but I can say at least that I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We talked offline and decided to add these pinnings in the top-level host environment and the build environment created from the |
||
- treelite {{ treelite_version }} | ||
|
||
outputs: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specific to this line, but just choosing a place to start a threaded conversation.
All conda-based tests are failing with issues like this:
(build link)
I'll investigate this right now. I suspect it might be something related to symbol visibility (like rapidsai/cudf#15483 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah!! It looks like either I missed some dependencies in the test scripts, or something else is holding back the
fmt
/spdlog
version?I see that RAPIDS libraries are coming from
rapidsai-nightly
and not the CI artifacts, and that the older versions offmt
andspdlog
are being used.(build link)
Will try replicating that conda solve locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing locally, I found that if I added constraints like
fmt>=11.0.2, spdlog>=1.14.1
in the conda environment, the solver was able to solve with all ofcuml
's runtime dependencies... including the versions ofcudf
,rmm
,ucxx
, etc. produced from the PRs for rapidsai/build-planning#56.I think that
libcuml
conda packages needhost:
dependencies onspdlog
andfmt
.That library's code directly uses both of those:
cuml/cpp/src/common/logger.cpp
Lines 20 to 21 in 7de8831
cuml/cpp/include/cuml/common/callbackSink.hpp
Line 47 in 7de8831
And having those dependencies would prevent these situations during development where the solver is able to choose to fall back to older RAPIDS nightlies from within the same release (but supporting different
fmt
/spdlog
).cudf
is doing the same thing: https://github.com/rapidsai/cudf/blob/9b4c4c721c399bae9e88733da79daa1a10644481/conda/recipes/libcudf/meta.yaml#L69-L71There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the right solution, thanks for investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up not being enough 😭
Look at https://github.com/rapidsai/cuml/actions/runs/10998530224/job/30541086357?pr=6071.
The environment is still solving with
fmt==10.2.1
,spdlog==1.12.0
, and older RAPIDS24.10.*
nightlies that support those (build link)... even though it used thecuml
/libcuml
packages from CI here.I just pushed another commit explicitly pinning
fmt
andspdlog
in the conda test environments, to force the solver to use the newer packages.This should be safe to do... as soon as there are
librmm
packages published torapids-nightly
containing the changes from rapidsai/rmm#1678, every environment containing the latestlibrmm
/rmm
will have those same pins in it.If that works, that's evidence that the
fmt
/spdlog
update is safe and we can start moving forward.I think both of those changes are worth merging here
host:
dependencies = to make CPM unnecessary forspdlog
and avoid unnecessary vendoring and therefore file clobberingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I see tests now passing and
fmt==11.0.2
+spdlog==1.14.1
getting installed!Build link: https://github.com/rapidsai/cuml/actions/runs/11000193616/job/30544094053?pr=6071
Another thing that I think happened here.... it's been several days since the
cudf
CI artifacts being pulled in here were produced, so therapidsai-nightly
channel's versions are now newer.e.g. rapidsai/cudf#16806 produces
cudf==24.10.00a364
butrapidsai-nightly
now containscudf==24.10.00a371
. That would also explain why some of the CI artifacts are being ignored, in favor of those fromrapidsai-nightly
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here: #6071 (comment)