Skip to content

Commit

Permalink
address code review
Browse files Browse the repository at this point in the history
  • Loading branch information
rjzamora committed Sep 11, 2024
1 parent 262e52f commit e75cd48
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 7 deletions.
4 changes: 2 additions & 2 deletions docs/source/examples/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ Spilling from Device

Dask-CUDA offers several different ways to enable automatic spilling from device memory.
The best method often depends on the specific workflow. For classic ETL workloads with
`Dask cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_, cuDF spilling is usually
the best place to start. See `spilling`_ for more details.
`Dask-cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_, cuDF spilling is usually the
best place to start. See :ref:`Spilling from device <spilling-from-device>` for more details.

Accelerated Networking
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
24 changes: 19 additions & 5 deletions docs/source/spilling.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _spilling-from-device:

Spilling from device
====================

Expand Down Expand Up @@ -110,7 +112,7 @@ to enable compatibility mode, which automatically calls ``unproxy()`` on all fun
cuDF Spilling
-------------

When executing a `Dask cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_
When executing a `Dask-cuDF <https://docs.rapids.ai/api/dask-cudf/stable/>`_
(i.e. Dask DataFrame) ETL workflow, it is usually best to leverage `native spilling support in
cuDF <https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory>`.

Expand Down Expand Up @@ -145,14 +147,23 @@ Statistics
~~~~~~~~~~

When cuDF spilling is enabled, it is also possible to have cuDF collect basic
spill statistics. This information can be a useful way to understand the
performance of Dask cuDF workflows with high memory utilization:
spill statistics. Collecting this information can be a useful way to understand
the performance of Dask-cuDF workflows with high memory utilization.

When deploying a ``LocalCUDACluster``, cuDF spilling can be enabled with the
``cudf_spill_stats`` argument:

.. code-block::
>>> cluster = LocalCUDACluster(n_workers=10, enable_cudf_spill=True, cudf_spill_stats=1)​
The same applies for ``dask cuda worker``:

.. code-block::
$ dask cuda worker --enable-cudf-spill --cudf-spill-stats 1
To have each dask-cuda worker print spill statistics, do something like:
To have each dask-cuda worker print spill statistics within the workflow, do something like:

.. code-block::
Expand All @@ -161,11 +172,14 @@ To have each dask-cuda worker print spill statistics, do something like:
print(get_global_manager().statistics)
client.submit(spill_info)
See the `cuDF spilling documentation
<https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#statistics>`_
for more information on the available spill-statistics options.

Limitations
~~~~~~~~~~~

Although cuDF spilling is the best option for most Dask cuDF ETL workflows,
Although cuDF spilling is the best option for most Dask-cuDF ETL workflows,
it will be much less effective if that workflow converts between ``cudf.DataFrame``
and other data formats (e.g. ``cupy.ndarray``). Once the underlying device buffers
are "exposed" to external memory references, they become "unspillable" by cuDF.
Expand Down

0 comments on commit e75cd48

Please sign in to comment.