Skip to content

Commit

Permalink
Merge branch 'main' into more-json-dev
Browse files Browse the repository at this point in the history
  • Loading branch information
douglasdavis authored Aug 16, 2023
2 parents 84a0dee + 4cbd754 commit d7957be
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 11 deletions.
8 changes: 7 additions & 1 deletion docs/ht-configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,13 @@ For example, they can be set with the form:
with dask.config.set({"awkward.optimization.<option>": False}):
...
- ``enabled`` (default: ``True``): Enable dask-awkward specific optimizations.
- ``enabled`` (default: ``True``): Enable dask-awkward specific
optimizations. More fine tuning can be handled with the ``which``
option.
- ``which`` (default: ``[columns, layer-chains]``): Which of the
optimizations to run. The default setting is to run all available
optimizations. (if ``enabled`` is set to ``False`` this option is
ignored).
- ``on-fail`` (default: ``warn``): When set to ``warn`` throw a
warning of the optimization fails and continue without performing
the optimization. If set to ``raise``, raise an exception at
Expand Down
21 changes: 19 additions & 2 deletions docs/me-optimization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,30 @@ we benefit from downstream in dask-awkward. You can read more about
Dask optimization in general :doc:`at this section of the Dask docs
<dask:optimize>`.

dask-awkward Optimizations
^^^^^^^^^^^^^^^^^^^^^^^^^^

There are two optimizations implemented in the dask-awkward code. One
is the ``layer-chains`` optimization that fuses adjacent task graph
layers together (if they are compatible with each other). This is a
relatively simple optimization that just simplifies the task graph.
The other optimization is the ``columns`` (or "necessary columns")
optimization; which is a bit more technical and described in a
follow-up section.

One can configure which optimizations to run at compute-time; read
more optimization. More information can be found in the
:ref:`configuration section
<ht-configuration:Optimization specific table>` of the docs.


Necessary Columns
^^^^^^^^^^^^^^^^^

We have one dask-awkward specific optimization that targets efficient
data access from disk. We call it the "necessary columns"
optimization. This optimization will execute the task graph *without
operating on real data*. The data-less executation of the graph helps
operating on real data*. The data-less execution of the graph helps
determine which parts of a dataset sitting on disk are actually
required to read in order to successfully complete the compute.

Expand Down Expand Up @@ -103,7 +120,7 @@ parameter:
at compute time.
- ``"warn"`` (the default): fail with a warning but let the compute
continue without the necessary columns optimization (can reduce
performance by reading unncessary data from disk).
performance by reading unnecessary data from disk).

One can also use the ``columns=`` argument (with
:func:`~dask_awkward.from_parquet`, for example) to manually define
Expand Down
15 changes: 12 additions & 3 deletions src/dask_awkward/awkward.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,20 @@ awkward:
# Optimization specific configuration
optimization:

# If true dask-awkward specific optimizations will be run. This is
# currently limited to determining necessary columns and applying
# column projection.
# If true dask-awkward specific optimizations will be run.
enabled: True

# Which of the optimizations do we want to run; options include:
# - "columns":
# Run the optimization step that determines which columns are
# necessary for a computation and only read those columns
# (either via Parquet, ROOT, or JSONSchema) from the data that
# is on disk.
# - "layer-chains":
# Fuse adjacent blockwise layers ("layer chains") into a single
# layer in the task graph.
which: [columns, layer-chains]

# This option controls whether or not a warning is thrown, an
# exception is raised, or if nothing is done if a dask-awkward
# specific optimization fails (right now this is only the column
Expand Down
11 changes: 6 additions & 5 deletions src/dask_awkward/lib/optimize.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,12 @@ def optimize(
input layers.
"""
if dask.config.get("awkward.optimization.enabled", default=False):
dsk = optimize_columns(dsk) # type: ignore

# blockwise layer chaining optimization.
dsk = rewrite_layer_chains(dsk)
if dask.config.get("awkward.optimization.enabled"):
which = dask.config.get("awkward.optimization.which")
if "columns" in which:
dsk = optimize_columns(dsk) # type: ignore
if "layer-chains" in which:
dsk = rewrite_layer_chains(dsk)

return dsk

Expand Down

0 comments on commit d7957be

Please sign in to comment.