-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynamic linking between RAPIDS wheels #33
Comments
How much space does this cost? I understand the simplicity benefits of doing it this way, but if our mission is to save space, why are we making this compromise? |
I'd guess it'll be on the order of 25MB. IMO if after the other changes we're still that close to the 1GB limit on any package then I don't think removing these files would be a real solution since all it would take is adding one new arch etc to the compilation for us to be over the limit again. What alternative would you suggest? That we build against RAPIDS dependencies installed in some other way and then specify a runtime dependency that contains only the libraries and nothing else? Also I'd add that I would expect the bulk of those 25 MB to come from bundling CCCL, which we could fix by creating a wheel for rapids-core-dependencies as well. |
Another significant benefit of this approach would be that the marginal cost of building for more Python versions (e.g. 3.12) would be much smaller. The most significant build cost would be paid exactly once for the C++ wheel (rather than for each Python minor version) and then we could build for many Python minor versions at a significantly reduced resource cost. |
Yes, that's definitely something else I considered. I was originally thinking of exposing the C++ library from the Python wheel directly, but one of the (multiple) reasons that tipped me towards a separate wheel was making the Python wheels relatively cheap to build. |
25 MB probably isn't enough to warrant extra complexity given the hundreds of MB we already are dealing with. We should definitely measure this stuff, though. We can have build-requires and requires dependencies both be specified. The former being the one with dev stuff, the latter without. It's not as nice as Conda's run_exports, but same idea. Doable, with room for improvement in tooling. |
If we were able to have a thin Cython layer around each dependency that we used exclusively, we could use that in other packages and have the benefits of reduced library duplication/static linking |
Yes, I definitely think that's worth doing. I considered that but didn't want to include that as part of this proposal because that's a change that we should try to make concurrently with conda packaging (we don't split this in conda either). There's a writeup about this somewhere, I'll find it and share it. |
I'm not sure I follow what you mean. How is this different from what's being proposed here, aside from adding a Cython wrapper? What would that Cython wrapper do? |
On the subject of measurements, here's what I currently see locally:
We're under 1 GB with this! For context, the pylibcugraph and cugraph wheels I see from recent PRs is 1.47 GB. One major missing piece here is NCCL, which I expect will add ~100MB back to the size. If I open up the wheels and look at their contents:
Definitely suggests that as I expected we wouldn't be benefiting much from trying to optimize the include directory, at least not unless we dramatically reduce library sizes somehow. |
Nice! I want to say this somewhere, here seems as good a place as any... since 1GB is a special value (a PyPI limit), I think as part of this work we should be enforcing that limit on wheels in CI across all the repos. That could be done with that |
Adding a link to this highly-relevant conversation happening on the Python discourse over the last 2 weeks. https://discuss.python.org/t/enforcing-consistent-metadata-for-packages/50008/28 Some quotes that really stood out to me
and
This conversation is closely related to PEP 725 - "Specifying external dependencies in pyproject.toml" (link) |
Thanks James! We should probably chime in there at some point, but perhaps once we're a bit further along with our implementation. |
One thing that we should keep in mind while implementing this feature is that it may cause problems for our usage of sccache in CI. After this change, C++ library dependencies will now be found in other wheels instead of being downloaded via CPM. While CPM's downloads will always go to the same path, wheels will instead be downloaded into a different ephemeral virtual environment during builds every time. If sccache sees the different path as a different dependency (i.e. if the path change results in a cache miss) then we will end up recompiling artifacts far more frequently than we should. I'm not sure if this is the case, so it's something we'll have to experiment with if nobody else knows for sure either (@trxcllnt, @ajschmidt8, or @robertmaynard might know this already). If it is an issue, there are two ways out of this:
|
|
I was thinking about pre-processor mode (https://github.com/mozilla/sccache/blob/main/docs/Local.md#preprocessor-cache-mode ) but that only allows you to ignore the working directory in the hash, and not other directories. Plus it doesn't work with non local backed caches... |
OK yeah so be it, I figured no build isolation was where we'd end up but wanted to check. It would have been nice if sccache had added some feature that made this possible! |
cc @raydouglass (for awareness) |
Fixes #1645 Contributes to rapidsai/build-planning#33 Similar to rapidsai/cudf#15982 Proposes more tightly controlling the visibility of symbols in the shared libraries produces for the `rmm` Python library, via the following: * compiling with `-fvisibility=hidden` by default * marking intended-to-be-public parts of `rmm` *(everything in the `rmm::` namespace)* with `__attribute__((visibility("default")))` ## Benefits of this change Reduces the risk of symbol conflicts when `rmm` is used alongside other libraries. For example, see this case in `cudf` where the `spdlog::` symbols in `rmm` are conflicting with the `spdlog::` symbols in `nvcomp`: rapidsai/cudf#15483 (comment) Reduces library size by a bit (around 0.3 MB uncompressed), by reducing the size of symbol tables in DSOs. ## Notes for Reviewers This is at the very edge of my C++ knowledge, apologies in advance if I've missed something obvious 😬 # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1644
Contributes to rapidsai/build-planning#33 Follow-up to #16299 This proposes some simplifications to `dependencies.yaml`. It's not intended to change any behavior. * more use of YAML anchors for requirements that are intended to be identical to each other * eliminating the `pylibcudf_build_dep` dependency group that was introduced in #16299, in favor of just tracking the `pylibcudf` build dependency alongside `cudf`'s `rmm` build dependency in the existing `build_python_cudf` group - *(sorry I'd missed that in the review on #16299)* I found myself starting to make similar changes in the PR breaking up these packages into more (splitting out a `libcudf` in #15483) and thought they'd be better as a standalone PR. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16597
Contributes to rapidsai/build-planning#33 Adds a standalone `libcudf` wheel, containing the `libcudf` C++ shared library. Fixes #16588 Authors: - Mike Sarahan (https://github.com/msarahan) - Vyas Ramasubramani (https://github.com/vyasr) - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Vyas Ramasubramani (https://github.com/vyasr) URL: #15483
…lishing (#16650) Follow-up to #15483. Contributes to rapidsai/build-planning#33 Wheel publishing for `libcudf` is failing like this: ```text Error: File "./dist/*.whl" does not exist ``` ([build link](https://github.com/rapidsai/cudf/actions/runs/10528569930/job/29176811683)) Because the `package-type` was not set to `cpp` in the `wheels-publish` CI workflow, and that workflow defaults to `python`. ([shared-workflows code link](https://github.com/rapidsai/shared-workflows/blob/157e9824e6e2181fca9aa5c4bea4defd4cc322b0/.github/workflows/wheels-publish.yaml#L23-L26)). This fixes that, and makes that choice explicit for all wheel publishing jobs. References for this `package-type` argument: * rapidsai/shared-workflows#209 * rapidsai/gha-tools#105 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) URL: #16650
Contributes to rapidsai/build-planning#33. Proposes the following for `cuspatial` wheels: * add build and runtime dependencies on `libcudf` wheels * stop vendoring copies of `libcudf.so`, `libnvcomp.so`, `libnvcomp_bitcomp.so`, and `libnvcomp_gdeflate.so` - *(load `libcudf.so` dynamically at runtime instead)* And other related changes for development/CI: * combine all `pip install` calls into 1 in wheel-testing scripts - *like rapidsai/cudf#16575 - *to improve the chance that packaging issues are discovered in CI* * `dependencies.yaml` changes: - more use of YAML anchors = less duplication - use dedicated `depends_on_librmm` and `depends_on_libcudf` groups * explicitly pass a package type to `gha-tools` wheel uploading/downloading scripts ## Notes for Reviewers ### Benefits of these changes Unblocks CI in this repo (ref: #1444 (comment), #1441 (comment)). Reduces wheel sizes for `cuspatial` wheels by about 125MB 😁 | wheel | size (before) | size (this PR) | |:-----------:|-------------:|---------------:| | `cuspatial` | 146.0M | 21M | | `cuproj ` | 0.9M | 0.9M | |**TOTAL** | **146.9M** | **21.9M** | *NOTES: size = compressed, "before" = 2024-08-21 nightlies (c60bd4d), CUDA = 12, Python = 3.11* <details><summary>how I calculated those (click me)</summary> ```shell # note: 2024-08-21 because that was the most recent date with # successfully-built cuspatial nightlies # docker run \ --rm \ -v $(pwd):/opt/work:ro \ -w /opt/work \ --network host \ --env RAPIDS_NIGHTLY_DATE=2024-08-21 \ --env RAPIDS_NIGHTLY_SHA=c60bd4d \ --env RAPIDS_PR_NUMBER=1447 \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env RAPIDS_REPOSITORY=rapidsai/cuspatial \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash mkdir -p "${WHEEL_DIR_BEFORE}" mkdir -p "${WHEEL_DIR_AFTER}" py_projects=( cuspatial cuproj ) for project in "${py_projects[@]}"; do # before RAPIDS_BUILD_TYPE=nightly \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="branch-24.10" \ RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}" # after RAPIDS_BUILD_TYPE=pull-request \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}" done du -sh ${WHEEL_DIR_BEFORE}/* du -sh ${WHEEL_DIR_BEFORE} du -sh ${WHEEL_DIR_AFTER}/* du -sh ${WHEEL_DIR_AFTER} ``` </details> Reduces the amount of additional work required to start shipping `libcuspatial` wheels. ### Background This is part of ongoing work towards packaging `libcuspatial` as a wheel. relevant prior work: * packaging `libcudf` wheels: rapidsai/cudf#15483 * consolidating `pip install` calls in CI scripts for `cudf`: rapidsai/cudf#16575 * `cudf` dropping its Arrow library dependency: rapidsai/cudf#16640 ### How I tested this Confirmed in local builds and CI logs that `cudf` is being *found*, not *built*, in `cuspatial` builds. ```text -- CPM: Using local package [email protected] ``` ([build link](https://github.com/rapidsai/cuspatial/actions/runs/10602971716/job/29386288614?pr=1447#step:9:23472)) Built `cuspatial` wheels locally and ran all the unit tests, without issue. # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Roeschke (https://github.com/mroeschke) URL: #1447
Contributes to rapidsai/build-planning#33 `cuproj` does not need the `rmm` Python package... it only needs the RMM headers at build time. This proposes the following changes for `cuproj`: * dropping the runtime requirement on `rmm` in wheels and conda packages * switching the build requirement from `rmm` to `librmm` for wheels and conda packages * removing unnecessary imports in the `test:` environment for conda packages For more context on these changes, see rapidsai/build-planning#92. ## Notes for Reviewers ### Benefits of these changes Faster conda builds (via dropping unnecessary dependencies). Cheaper (in terms of bandwidth and disk space) installation of wheels and conda packages (via removing an unnecessary runtime dependency). Reduces a source of network calls (and therefore CI instability) by removing some CPM downloads of RMM. Before: ```text -- CPM: Adding package [email protected] (branch-24.10) ``` ([build link](https://github.com/rapidsai/cuspatial/actions/runs/10618529204/job/29434041322#step:9:16754)) After (this PR): ```text -- CPM: Using local package [email protected] ``` ([build link](https://github.com/rapidsai/cuspatial/actions/runs/10619138604/job/29436119470?pr=1448#step:9:11256)) ### Is this required for `libcuspatial` wheel packaging? No, it's just a side thing I noticed while working on that. The two are totally independent. # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #1448
Contributes to rapidsai/build-planning#33 Adds `libcuspatial` wheels, and switches `cuspatial` wheels to using them. ## Notes for Reviewers ### Benefits of these changes Faster CI runs and smaller total footprint on package repositories (because now `libcuspatial` no longer needs to be compiled once per Python version). Smaller `cuspatial` wheels. | whee. l | size (before) | size (this PR) | |:-------------:|-------------:|----------------:| | `libcuspatial` | --- | 17.0M | | `cuspatial`. | 21.0M | 4.1M | | `cuproj ` | 0.9M | 0.9M | |**TOTAL** | **21.9M** | **22.0M** | *NOTES: size = compressed, "before" = 2024-09-02 nightlies (1544e7b), CUDA = 12, Python = 3.11* <details><summary>how I calculated those (click me)</summary> ```shell docker run \ --rm \ -v $(pwd):/opt/work:ro \ -w /opt/work \ --network host \ --env RAPIDS_NIGHTLY_DATE=2024-09-02 \ --env RAPIDS_NIGHTLY_SHA=1544e7b \ --env RAPIDS_PR_NUMBER=1450 \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env RAPIDS_REPOSITORY=rapidsai/cuspatial \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash mkdir -p "${WHEEL_DIR_BEFORE}" mkdir -p "${WHEEL_DIR_AFTER}" py_projects=( cuspatial cuproj ) for project in "${py_projects[@]}"; do # before RAPIDS_BUILD_TYPE=nightly \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="branch-24.10" \ RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}" # after RAPIDS_BUILD_TYPE=pull-request \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}" done # after RAPIDS_BUILD_TYPE=pull-request \ RAPIDS_PY_WHEEL_NAME="libcuspatial_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \ rapids-download-wheels-from-s3 cpp "${WHEEL_DIR_AFTER}" du -sh ${WHEEL_DIR_BEFORE}/* du -sh ${WHEEL_DIR_BEFORE} du -sh ${WHEEL_DIR_AFTER}/* du -sh ${WHEEL_DIR_AFTER} ``` </details> ### devcontainers job? Once this PR is close to ready, let's merge the devcontainers PR and then re-run the devcontainers CI here. devcontainers PR: rapidsai/devcontainers#387 ### `rapids-metadata` changes? Not necessary, `libcuspatial` is already there: https://github.com/rapidsai/rapids-metadata/blob/9b6307e708511cd9a1990d8bb36606df53bc9e1b/src/rapids_metadata/__init__.py#L89 # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1450
Follow-up to #15483. Contributes to rapidsai/build-planning#33. Adds a build-time dependency on `libkvikio` wheels for `libcudf` wheels (per #15483 (comment)). With this change, CPM is no longer used to download and install the kvikio headers. Before: ```text -- Found cuFile: /usr/local/cuda/lib64/libcufile.so -- CPM: Adding package [email protected] (branch-24.10) ``` ([recent build link from branch-24.10](https://github.com/rapidsai/cudf/actions/runs/10780576194/job/29896649202#step:9:7673)) After: ```text -- KvikIO: Found cuFile Batch API: TRUE -- KvikIO: Found cuFile Stream API: TRUE -- CPM: Using local package [email protected] ``` ([build link from this PR](https://github.com/rapidsai/cudf/actions/runs/10780504202/job/29896555443?pr=16778#step:9:7754)) ## Notes for Reviewers ### This removes kvikio headers/CMake files from libcudf wheels Cuts around 0.8 MB (23 files) out of `libcudf` wheels. As of this PR, these would no longer be vendored in `libcudf` wheels: ```text 0 09-08-2024 06:17 libcudf/include/kvikio/ 0 09-08-2024 06:17 libcudf/include/kvikio/shim/ 6356 09-08-2024 06:17 libcudf/include/kvikio/batch.hpp 3812 09-08-2024 06:17 libcudf/include/kvikio/buffer.hpp 10499 09-08-2024 06:17 libcudf/include/kvikio/utils.hpp 1399 09-08-2024 06:17 libcudf/include/kvikio/cufile_config.hpp 33385 09-08-2024 06:17 libcudf/include/kvikio/file_handle.hpp 7299 09-08-2024 06:17 libcudf/include/kvikio/driver.hpp 9678 09-08-2024 06:17 libcudf/include/kvikio/defaults.hpp 5352 09-08-2024 06:17 libcudf/include/kvikio/stream.hpp 6002 09-08-2024 06:17 libcudf/include/kvikio/error.hpp 4501 09-08-2024 06:17 libcudf/include/kvikio/bounce_buffer.hpp 3197 09-08-2024 06:17 libcudf/include/kvikio/parallel_operation.hpp 9864 09-08-2024 06:17 libcudf/include/kvikio/posix_io.hpp 717 09-08-2024 06:17 libcudf/include/kvikio/version_config.hpp 4529 09-08-2024 06:17 libcudf/include/kvikio/shim/cuda.hpp 3331 09-08-2024 06:17 libcudf/include/kvikio/shim/utils.hpp 4055 09-08-2024 06:17 libcudf/include/kvikio/shim/cufile_h_wrapper.hpp 2242 09-08-2024 06:17 libcudf/include/kvikio/shim/cuda_h_wrapper.hpp 7510 09-08-2024 06:17 libcudf/include/kvikio/shim/cufile.hpp 0 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/ 5031 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-targets.cmake 3681 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-config-version.cmake 6915 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-config.cmake 1529 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/kvikio-dependencies.cmake 3851 09-08-2024 06:17 libcudf/lib64/cmake/kvikio/FindcuFile.cmake ``` This is safe because kvikio is a PRIVATE dependency of `libcudf`. https://github.com/rapidsai/cudf/blob/150f1b10ed9c702d5283216b746df685e1708716/cpp/CMakeLists.txt#L796-L802 # Authors: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16778
I split this proposal out into its own issue: #110 |
I've updated the task list here. I need some help understanding the sequence here though. #33 (comment) said that symbol visibility issues in RAFT need to be resolved, tracked in rapidsai/raft#1722. A bunch of PRs have gone in contributing to rapidsai/raft#1722, but that issue is still open... I'm not sure what's left for it. That comment also said creating a So am I right that these things need to be done in the following order?
|
We need to touch base with @cjnolet to get an update on what the current plan is for raft. There are a few questions that we need answers to, mostly around what the cuvs-raft relationship is going to wind up being and whether raft will still become header-only as was originally planned. In the scramble around cuvs there were some instances where the ideas were reconsidered and I don't know what the current plan is and what the timeline is. I'd like to minimize duplicate work around this as much as possible since some cases will have more pitfalls than others and it would be wasteful to go down a rabbit hole that we expect to vanish eventually anyway. |
Jake original proposal also includes having every host template function in RAFT ( e.g. ~90% of RAFT host code ) should be annotated as attribute((visibility("hidden"))). That is a massive change and most likely breaks the ability to pass RAFT types across DSO boundaries.
We can skip steps 1 and 2 and go straight to three. The libraft wheel I expect will have minimal value ( as measured by library size ) going forward and is not needed for correctness when building |
Currently RAPIDS wheels adhere strictly to the manylinux policy. While the glibc/kernel ABI restrictions are not particularly onerous, the requirement that binary wheels be essentially self-contained and only depend on a small set of external shared libraries is problematic. To adhere to this restriction, RAPIDS wheels statically link (or in rare cases, bundle) all of their external library dependencies, leading to severe binary bloat. The biggest problem with this behavior is that the current sizes prohibit us from publishing our wheels on PyPI. Beyond that come the usual more infrastructural problems: longer CI times due to extra compilation, larger binaries making wheel download and installation slower, etc. The focus of this issue is to define a better solution than static linking for this problem that still adheres to the manylinux spec in spirit while reducing binary sizes. This issue will not address the usage of CUDA math library dynamic library wheels; that will be discussed separately.
Proposed Solution
RAPIDS should start publishing its C++ libraries as standalone wheels that can be pip installed independently from the Python(/Cython) wheels.These wheels should
A key question to address is how to encode binary dependencies between wheels. One option is for each wheel to embed RPATHs pointing to the expected relative path to library dependencies in other wheels. This could be accomplished with some CMake to extract library locations from targets and then construct relative paths during the build based on the assumption that the packages are installed into a standard site-packages layout. However, since this approach is fragile and has generally been frowned upon by the Python community in the past, I suggest that we instead exploit dynamic loading to load the library on import of a package. This choice would make packages sensitive to import order (C++ wheels would need to be imported before any other extension module that links to them) but I think that's a reasonable price to pay since it only matters when depending on a C++ wheel. This solution also lets us handle the logic in Python, making it far easier to configure and control. Moreover, it will make the solution fairly composable when an extension module depends on a C++ wheel that depends on yet another C++ wheel.
Once these wheels exist, we should rewrite the existing Python packages to require the corresponding C++ wheels. The current approach of "find C++ if exists, build otherwise" can be scrapped in favor of always requiring that the C++ CMake package be found. Consumers will have the choice of installing the C++ library (e.g. from conda), building it from source, or installing the C++ wheel. The C++ wheel will become a hard dependency in pyproject.toml, so it will automatically be installed when building. In conda environments the pyproject dependencies are ignored, so the new wheels will not be installed, and similarly in devcontainer builds where requirements are generated dynamically from dependencies.yaml. Ultimately a pylibraft->libraft dependency will behave nearly identically to a raft-dask->pylibraft dependency from the perspective of dependency management.
Notes
Implementation notes
build.sh
(adding wheel build for libcudf cudf#15483 (comment))cpp
vs.python
in all shared-workflows jobs, gha-tools scripts explicit24.06 release
24.08 release
24.10 release
24.12 release
The text was updated successfully, but these errors were encountered: