From e75cd48dd17bea6d21e8b6f51b2f2e925f31d2f3 Mon Sep 17 00:00:00 2001 From: rjzamora Date: Wed, 11 Sep 2024 08:53:43 -0700 Subject: [PATCH] address code review --- docs/source/examples/best-practices.rst | 4 ++-- docs/source/spilling.rst | 24 +++++++++++++++++++----- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/docs/source/examples/best-practices.rst b/docs/source/examples/best-practices.rst index d6cc7189..03e26bf5 100644 --- a/docs/source/examples/best-practices.rst +++ b/docs/source/examples/best-practices.rst @@ -49,8 +49,8 @@ Spilling from Device Dask-CUDA offers several different ways to enable automatic spilling from device memory. The best method often depends on the specific workflow. For classic ETL workloads with -`Dask cuDF `_, cuDF spilling is usually -the best place to start. See `spilling`_ for more details. +`Dask-cuDF `_, cuDF spilling is usually the +best place to start. See :ref:`Spilling from device ` for more details. Accelerated Networking ~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/spilling.rst b/docs/source/spilling.rst index 066284fa..04cfa05c 100644 --- a/docs/source/spilling.rst +++ b/docs/source/spilling.rst @@ -1,3 +1,5 @@ +.. _spilling-from-device: + Spilling from device ==================== @@ -110,7 +112,7 @@ to enable compatibility mode, which automatically calls ``unproxy()`` on all fun cuDF Spilling ------------- -When executing a `Dask cuDF `_ +When executing a `Dask-cuDF `_ (i.e. Dask DataFrame) ETL workflow, it is usually best to leverage `native spilling support in cuDF `. @@ -145,14 +147,23 @@ Statistics ~~~~~~~~~~ When cuDF spilling is enabled, it is also possible to have cuDF collect basic -spill statistics. This information can be a useful way to understand the -performance of Dask cuDF workflows with high memory utilization: +spill statistics. Collecting this information can be a useful way to understand +the performance of Dask-cuDF workflows with high memory utilization. + +When deploying a ``LocalCUDACluster``, cuDF spilling can be enabled with the +``cudf_spill_stats`` argument: + +.. code-block:: + + >>> cluster = LocalCUDACluster(n_workers=10, enable_cudf_spill=True, cudf_spill_stats=1)​ + +The same applies for ``dask cuda worker``: .. code-block:: $ dask cuda worker --enable-cudf-spill --cudf-spill-stats 1 -To have each dask-cuda worker print spill statistics, do something like: +To have each dask-cuda worker print spill statistics within the workflow, do something like: .. code-block:: @@ -161,11 +172,14 @@ To have each dask-cuda worker print spill statistics, do something like: print(get_global_manager().statistics) client.submit(spill_info) +See the `cuDF spilling documentation +`_ +for more information on the available spill-statistics options. Limitations ~~~~~~~~~~~ -Although cuDF spilling is the best option for most Dask cuDF ETL workflows, +Although cuDF spilling is the best option for most Dask-cuDF ETL workflows, it will be much less effective if that workflow converts between ``cudf.DataFrame`` and other data formats (e.g. ``cupy.ndarray``). Once the underlying device buffers are "exposed" to external memory references, they become "unspillable" by cuDF.