Skip to content

Commit

Permalink
add cudf spilling to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rjzamora committed Sep 10, 2024
1 parent 72d51e9 commit 89027be
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/source/examples/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ We also recommend allocating most, though not all, of the GPU memory space. We d

Additionally, when using `Accelerated Networking`_ , we only need to register a single IPC handle for the whole pool (which is expensive, but only done once) since from the IPC point of viewer there's only a single allocation. As opposed to just using RMM without a pool where each new allocation must be registered with IPC.

Spilling from Device
~~~~~~~~~~~~~~~~~~~~

Dask CUDA offers several different ways to enable automatic spilling from device memory.
The best method often depends on the specific workflow. For classic ETL workloads with
`Dask cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_, cuDF spilling is usually
the best place to start. See `spilling`_ for more details.

Accelerated Networking
~~~~~~~~~~~~~~~~~~~~~~

Expand Down
65 changes: 65 additions & 0 deletions docs/source/spilling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,68 @@ type checking doesn't:
Thus, if encountering problems remember that it is always possible to use ``unproxy()``
to access the proxied object directly, or set ``DASK_JIT_UNSPILL_COMPATIBILITY_MODE=True``
to enable compatibility mode, which automatically calls ``unproxy()`` on all function inputs.


cuDF Spilling
-------------

When executing a `Dask cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_
(i.e. Dask DataFrame) ETL workflow, it is usually best to leverage `native spilling support in
cuDF <https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory>`.

Native cuDF spilling has an important advantage over the other methodologies mentioned
above. When JIT-unspill or default spilling are used, the worker is only able to spill
the input or output of a task. This means that any data that is created within the task
is completely off limits until the task is done executing. When cuDF spilling is used,
however, individual device buffers can be spilled/unspilled as needed while the task
is executing.

When deploying a ``LocalCUDACluster``, cuDF spilling can be enabled with the ``enable_cudf_spill`` argument:

.. code-block::
>>> from distributed import Client​
>>> from dask_cuda import LocalCUDACluster​
>>> cluster = LocalCUDACluster(n_workers=10, enable_cudf_spill=True)​
>>> client = Client(cluster)​
The same applies for ``dask cuda worker``:

.. code-block::
$ dask scheduler
distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:8786
$ dask cuda worker --enable-cudf-spill
Statistics
~~~~~~~~~~

When cuDF spilling is enabled, it is also possible to have cuDF collect basic
spill statistics. This information can be a useful way to understand the
performance of Dask cuDF workflows with high memory utilization:

.. code-block::
$ dask cuda worker --enable-cudf-spill --cudf-spill-stats 1
To have each dask-cuda worker print spill statistics, do something like:

.. code-block::
def spill_info():
from cudf.core.buffer.spill_manager import get_global_manager
print(get_global_manager().statistics)
client.submit(spill_info)
Limitations
~~~~~~~~~~~

Although cuDF spilling is the best option for most Dask cuDF ETL workflows,
it will be much less effective if that workflow converts between ``cudf.DataFrame``
and other data formats (e.g. ``cupy.ndarray``). Once the underlying device buffers
are "exposed" to external memory references, they become "unspillable" by cuDF.
In cases like this (e.g. Dask CUDA + XGBoost), JIT-Unspill is usually a better choice.

0 comments on commit 89027be

Please sign in to comment.