Fully parallelise `batch_read` when 1 or more of the symbols requested is recursively normalised #1968

alexowens90 · 2024-10-31T16:54:32Z

Currently, the code flow when reading and batch reading pointlessly pivots back off the Python layer in the middle of the call, before calling batch_read_keys.
This is not too inefficient in the non-batch case, as batch_read_keys is parallelised. However, in batch_read, the calls to batch_read_keys are called sequentially for each symbol in the batch that is recursively normalized, and so this will be very inefficient if there are many such small symbols.

There is no conceptual need to pivot off Python mid-call, this could all be handled in the C++ layer with appropriate structures returned to Python.

The text was updated successfully, but these errors were encountered:

#### Reference Issues/PRs Closes #1895 Fixes man-group/arcticdb-man#171 Fixes #1936 Fixes #1939 Fixes #1940 #### What does this implement or fix? Schedules all work asynchronously in batch reads when processing is involved, as well as when all symbols are being read directly. Previously, symbols were processed sequentially, leading to idle CPUs when processing lots of smaller symbols. This works by making `read_frame_for_version` schedule work and return futures, rather than actually performing the processing. This implementation can then be used for all 4 combinations of batch/non-batch and direct/with processing reads, significantly simplifying the code and removing the now redundant `async_read_direct` (the fact that there were two different implementations to achieve effectively the same thing is what led to 2 of the bugs in the first place). Several bugs that were discovered during the implementation (flagged above) have also been fixed. Further work in this area covered in #1968

alexowens90 assigned alexowens90 and unassigned alexowens90 Oct 31, 2024

alexowens90 added enhancement New feature or request refactor labels Oct 31, 2024

alexowens90 mentioned this issue Oct 31, 2024

Enhancement 1895: Fully parallelise processing in read_batch #1950

Merged

maxim-morozov added replicated and removed replicated labels Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully parallelise `batch_read` when 1 or more of the symbols requested is recursively normalised #1968

Fully parallelise `batch_read` when 1 or more of the symbols requested is recursively normalised #1968

alexowens90 commented Oct 31, 2024 •

edited

Loading

Fully parallelise batch_read when 1 or more of the symbols requested is recursively normalised #1968

Fully parallelise batch_read when 1 or more of the symbols requested is recursively normalised #1968

Comments

alexowens90 commented Oct 31, 2024 • edited Loading

Fully parallelise `batch_read` when 1 or more of the symbols requested is recursively normalised #1968

Fully parallelise `batch_read` when 1 or more of the symbols requested is recursively normalised #1968

alexowens90 commented Oct 31, 2024 •

edited

Loading