Skip to content

Commit

Permalink
Add notes on cudf spilling to docs (#1383)
Browse files Browse the repository at this point in the history
Updates the dask-cuda documentation to include notes on native cuDF spilling, since it is often the best spilling approach for ETL with Dask cuDA (please feel free to correct me if I'm wrong).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1383
  • Loading branch information
rjzamora authored Sep 11, 2024
1 parent 72d51e9 commit dc168d7
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/source/examples/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ We also recommend allocating most, though not all, of the GPU memory space. We d

Additionally, when using `Accelerated Networking`_ , we only need to register a single IPC handle for the whole pool (which is expensive, but only done once) since from the IPC point of viewer there's only a single allocation. As opposed to just using RMM without a pool where each new allocation must be registered with IPC.

Spilling from Device
~~~~~~~~~~~~~~~~~~~~

Dask-CUDA offers several different ways to enable automatic spilling from device memory.
The best method often depends on the specific workflow. For classic ETL workloads using
`Dask cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_, cuDF spilling is usually the
best place to start. See :ref:`Spilling from device <spilling-from-device>` for more details.

Accelerated Networking
~~~~~~~~~~~~~~~~~~~~~~

Expand Down
79 changes: 79 additions & 0 deletions docs/source/spilling.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _spilling-from-device:

Spilling from device
====================

Expand Down Expand Up @@ -105,3 +107,80 @@ type checking doesn't:
Thus, if encountering problems remember that it is always possible to use ``unproxy()``
to access the proxied object directly, or set ``DASK_JIT_UNSPILL_COMPATIBILITY_MODE=True``
to enable compatibility mode, which automatically calls ``unproxy()`` on all function inputs.


cuDF Spilling
-------------

When executing an ETL workflow with `Dask cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_
(i.e. Dask DataFrame), it is usually best to leverage `native spilling support in cuDF
<https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory>`.

Native cuDF spilling has an important advantage over the other methodologies mentioned
above. When JIT-unspill or default spilling are used, the worker is only able to spill
the input or output of a task. This means that any data that is created within the task
is completely off limits until the task is done executing. When cuDF spilling is used,
however, individual device buffers can be spilled/unspilled as needed while the task
is executing.

When deploying a ``LocalCUDACluster``, cuDF spilling can be enabled with the ``enable_cudf_spill`` argument:

.. code-block::
>>> from distributed import Client​
>>> from dask_cuda import LocalCUDACluster​
>>> cluster = LocalCUDACluster(n_workers=10, enable_cudf_spill=True)​
>>> client = Client(cluster)​
The same applies for ``dask cuda worker``:

.. code-block::
$ dask scheduler
distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:8786
$ dask cuda worker --enable-cudf-spill
Statistics
~~~~~~~~~~

When cuDF spilling is enabled, it is also possible to have cuDF collect basic
spill statistics. Collecting this information can be a useful way to understand
the performance of memory-intensive workflows using cuDF.

When deploying a ``LocalCUDACluster``, cuDF spilling can be enabled with the
``cudf_spill_stats`` argument:

.. code-block::
>>> cluster = LocalCUDACluster(n_workers=10, enable_cudf_spill=True, cudf_spill_stats=1)​
The same applies for ``dask cuda worker``:

.. code-block::
$ dask cuda worker --enable-cudf-spill --cudf-spill-stats 1
To have each dask-cuda worker print spill statistics within the workflow, do something like:

.. code-block::
def spill_info():
from cudf.core.buffer.spill_manager import get_global_manager
print(get_global_manager().statistics)
client.submit(spill_info)
See the `cuDF spilling documentation
<https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#statistics>`_
for more information on the available spill-statistics options.

Limitations
~~~~~~~~~~~

Although cuDF spilling is the best option for most ETL workflows using Dask cuDF,
it will be much less effective if that workflow converts between ``cudf.DataFrame``
and other data formats (e.g. ``cupy.ndarray``). Once the underlying device buffers
are "exposed" to external memory references, they become "unspillable" by cuDF.
In cases like this (e.g., Dask-CUDA + XGBoost), JIT-Unspill is usually a better choice.

0 comments on commit dc168d7

Please sign in to comment.