GH-41706: [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input #44083

mroz45 · 2024-09-12T12:54:15Z

Rationale for this change

This is initial PR. I found that with specyfics parameters test fails.

What changes are included in this PR?

In this PR I provoke fail of asof_join_node_test.

GitHub Issue: [Python] join_asof out-of-order error for big sorted tables #41706

github-actions · 2024-09-12T12:54:40Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

github-actions · 2024-09-16T10:51:19Z

⚠️ GitHub issue #41706 has been automatically assigned in GitHub to PR creator.

github-actions · 2024-09-16T10:53:07Z

⚠️ GitHub issue #41706 has been automatically assigned in GitHub to PR creator.

westonpace

I've let CI run, there are some formatting changes needed. Otherwise, this seems straightforward enough.

The most significant change would probably be the breaking change that the asof join node now requires sequenced input. However, I feel like this was always true in theory.

Perhaps @zanmato1984 @ZhangHuiGui @icexelloss or @rtpsw have thoughts? I'm not certain who all is still using this node.

cpp/src/arrow/acero/asof_join_node_test.cc

cpp/src/arrow/acero/source_node.cc

westonpace · 2024-10-06T13:51:01Z

cpp/src/arrow/dataset/scanner.h

-                           bool require_sequenced_output = false)
+                           bool require_sequenced_output = true)


Changing this default would be a breaking change and I'm not certain it's warranted.

westonpace · 2024-10-06T13:52:09Z

cpp/src/arrow/acero/asof_join_node.cc

+    if(::arrow::compute::kUnsequencedIndex == batch.index)
+      return Status::Invalid("AsofJoin requires sequenced input");


This would be a breaking change I believe. However, it does seem accurate that this is required.

westonpace · 2024-10-06T14:15:04Z

This CI failure seems relevant:

FAILED: src/arrow/acero/CMakeFiles/arrow_acero_objlib.dir/asof_join_node.cc.o 
/usr/bin/ccache /usr/lib/ccache/clang++-14 -DADDRESS_SANITIZER -DARROW_ACERO_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_UBSAN -DJSON_DIAGNOSTICS=0 -DJSON_USE_IMPLICIT_CONVERSIONS=1 -I/build/cpp/src -I/arrow/cpp/src -I/arrow/cpp/src/generated -isystem /build/cpp/opentelemetry_ep-install/include -Qunused-arguments -fcolor-diagnostics  -Wall -Wextra -Wdocumentation -DARROW_WARN_DOCUMENTATION -Wshorten-64-to-32 -Wno-missing-braces -Wno-unused-parameter -Wno-constant-logical-operand -Wno-return-stack-address -Wdate-time -Wno-unknown-warning-option -Wno-pass-failed -msse4.2  -fsanitize=address -DADDRESS_SANITIZER -fsanitize=undefined -fno-sanitize=alignment,vptr,function,float-divide-by-zero -fno-sanitize-recover=all -fsanitize-coverage=pc-table,inline-8bit-counters,edge,no-prune,trace-cmp,trace-div,trace-gep -fsanitize-blacklist=/arrow/cpp/build-support/sanitizer-disallowed-entries.txt -g -Werror -O0 -ggdb -g1 -fPIC   -fsanitize-coverage=pc-table,inline-8bit-counters,edge,no-prune,trace-cmp,trace-div,trace-gep -std=c++17 -MD -MT src/arrow/acero/CMakeFiles/arrow_acero_objlib.dir/asof_join_node.cc.o -MF src/arrow/acero/CMakeFiles/arrow_acero_objlib.dir/asof_join_node.cc.o.d -o src/arrow/acero/CMakeFiles/arrow_acero_objlib.dir/asof_join_node.cc.o -c /arrow/cpp/src/arrow/acero/asof_join_node.cc
In file included from /arrow/cpp/src/arrow/acero/asof_join_node.cc:18:
In file included from /arrow/cpp/src/arrow/acero/asof_join_node.h:20:
In file included from /arrow/cpp/src/arrow/acero/options.h:21:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/memory:76:
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2: error: delete called on non-final 'arrow::acero::InputState' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
        delete __ptr;
        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4: note: in instantiation of member function 'std::default_delete<arrow::acero::InputState>::operator()' requested here
          get_deleter()(std::move(__ptr));
          ^
/arrow/cpp/src/arrow/acero/asof_join_node.cc:518:12: note: in instantiation of member function 'std::unique_ptr<arrow::acero::InputState>::~unique_ptr' requested here
    return std::make_unique<InputState>(index, tolerance, must_hash, may_rehash,
           ^
1 error generated.

gitmodimo · 2024-10-09T20:13:48Z

Let me pitch in. Disclaimer I am working @mroz45 on the same project using arrow.

                           -bool require_sequenced_output = false)
                           +bool require_sequenced_output = true)

Changing this default would be a breaking change and I'm not certain it's warranted.

Without it python tests are failing.

Should this only be Ordering::Implicit if require_sequenced_output is set?

It requires deeper and breaking changes which I think are necessary.
require_sequenced_output - means the source should give implicit ordering to produces batches ant therefore require_sequenced_output should be moved from ScanNodeOptions to ScanOptions to allow pass this option to ScannerBuilder which is used by python. But in fact I think there should be unified way to assert implicit ordering in all source nodes. Or maybe the need for implicit ordering should propagate from nodes that need ordering (asof_join, fetch etc.) down the line to source nodes (and maybe fail if the source node cannot provide it). There are few related issues:
no standardized sorting information
add ordering information to exec batches
Add AsofJoin Ordering Assertion

This issue gave me the idea that implicit ordering should be asserted by default. And additional source node/additional option to assert no ordering - to enable some performance optimization for "don't care" ordering cases. This would fix those issues:
asof_join node not working propertly
order is unstable
Preserve order when writing dataset
ordering is weird
dataset not preserving ordering
scan node not asserting ordering

We are willing to contribute to fix ordering issue within acero but we have next to none experience in python/Cython. Also the size of the issue seems to grow with every little change. I think the ordering in Acero is a little bigger topic to discuss.

gitmodimo · 2024-10-16T13:36:19Z

The addition of ordering to python was more straightforward than I initially anticipated. Current version of pull request should fix at least some of above-mentioned python issues. Please review python changes carefully since it was our first contact with cython.

PS. Can we also Export SequencingQueue and SerialSequencingQueue
This is useful in implementing custom nodes that require ordering. It seems reasonable to export it for custom nodes developers. Or maybe I should create separate pull request?

westonpace

Thanks for continuing to work on this and thank you for figuring out how to avoid the breaking change for now.

Or maybe the need for implicit ordering should propagate from nodes that need ordering (asof_join, fetch etc.) down the line to source nodes (and maybe fail if the source node cannot provide it)

This seems reasonable but might be a bit complex.

This #27651 gave me the idea that implicit ordering should be asserted by default. And additional source node/additional option to assert no ordering - to enable some performance optimization for "don't care" ordering cases. This would fix those issues:

This seems logical to me. Make the default safe but slow and add a potentially dangerous "I know what I'm doing" option to speed it up but still keep the responsibility on the user to figure out if that speedup is warranted.

I think it's probably pretty safe to change the defaults in things like acero.py. My only concern would be changing defaults lower level in C++ which might change defaults in things like dataset.py which uses Acero and has a much wider user base than acero.py. However, it may be that the usage of Acero from datasets already sets this parameter in all cases and doesn't rely on defaults (it's been too long since I've really dug in). So if we feel that is the case then changing defaults lower down is probably fine too.

I'm +1 on this change provided CI gets to a good state. Feel free to @ me on this issue if you make any changes and need CI re-run.

gitmodimo · 2024-10-29T19:55:46Z

It looks like all checks passed. There are probably few issues (like this, this maybe ) that should be verified and closed by this pull request. I think some of this work also overlaps with GH-26818

github-actions · 2024-10-29T21:01:10Z

⚠️ GitHub issue #41706 has been automatically assigned in GitHub to PR creator.

westonpace · 2024-10-29T21:01:30Z

@gitmodimo thanks for the help getting this sorted out!

gitmodimo · 2024-10-29T22:23:45Z

@gitmodimo thanks for the help getting this sorted out!

My pleasure. Me and @mroz45 have few more enhancements for upstreaming. We will post them soon.

conbench-apache-arrow · 2024-10-30T04:37:40Z

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 2a0f06c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

EnricoMi · 2024-10-31T12:20:54Z

cpp/src/arrow/dataset/scanner.cc

@@ -1057,9 +1057,12 @@ Result<acero::ExecNode*> MakeScanNode(acero::ExecPlan* plan,
        batch->values.emplace_back(partial.record_batch.index);
        batch->values.emplace_back(partial.record_batch.last);
        batch->values.emplace_back(partial.fragment.value->ToString());
+        if (index != compute::kUnsequencedIndex) batch->index = index++;
        return batch;
      });


Looks like you are adding the batch index here, which is already done by the SourceNode when ordering = Ordering::Implicit() (see line 153-155):

arrow/cpp/src/arrow/acero/source_node.cc

Lines 134 to 162 in 567f9c5

plan_->query_context()->ScheduleTask(

[this, morsel_length, use_legacy_batching, initial_batch_index, morsel,

has_ordering = !ordering_.is_unordered()]() {

int64_t offset = 0;

int batch_index = initial_batch_index;

do {

int64_t batch_size =

std::min<int64_t>(morsel_length - offset, ExecPlan::kMaxBatchSize);

// In order for the legacy batching model to work we must

// not slice batches from the source

if (use_legacy_batching) {

batch_size = morsel_length;

}

ExecBatch batch = morsel.Slice(offset, batch_size);

UnalignedBufferHandling unaligned_buffer_handling =

plan_->query_context()->options().unaligned_buffer_handling.value_or(

GetDefaultUnalignedBufferHandling());

ARROW_RETURN_NOT_OK(

HandleUnalignedBuffers(&batch, unaligned_buffer_handling));

if (has_ordering) {

batch.index = batch_index;

}

offset += batch_size;

batch_index++;

ARROW_RETURN_NOT_OK(output_->InputReceived(this, std::move(batch)));

} while (offset < morsel.length);

return Status::OK();

},

"SourceNode::ProcessMorsel");

This seems redundant.

It seems so. I first added indexing and then fixed ordering information propagation to source_node. Not sure it is possible to update already merged PR.

It's not possible, I'll draft a followup PR, if you don't mind.

EnricoMi · 2024-10-31T12:33:57Z

cpp/src/arrow/acero/source_node.cc

@@ -406,7 +407,7 @@ struct SchemaSourceNode : public SourceNode {
 struct RecordBatchReaderSourceNode : public SourceNode {
  RecordBatchReaderSourceNode(ExecPlan* plan, std::shared_ptr<Schema> schema,
                              arrow::AsyncGenerator<std::optional<ExecBatch>> generator)
-      : SourceNode(plan, schema, generator) {}
+      : SourceNode(plan, schema, generator, Ordering::Implicit()) {}


Isn't this a bit too restrictive? This means all RecordBatch reader source always preserve the implicit order, even if if users do not care? Shouldn't this be configurable? What about other sources? Shouldn't the behaviour be the same for any source?

Probably option to specify ordering would be best. Or maybe automatically assign implicit ordering based on
batch.index!=compute::kUnsequencedIndex?

Can you elaborate on why your asof_join enhancement requires specifically the RecordBatchReaderSrouceNode to have implicit ordering?

It does not. I assumed that when user provides a generator, it has some kind of implicot order. Now I see it is not necessarily a case.

EnricoMi · 2024-10-31T12:49:54Z

python/pyarrow/acero.py

-        right_source = _dataset_to_decl(right_operand, use_threads=use_threads)
+        right_source = _dataset_to_decl(
+            right_operand, use_threads=use_threads,
+            require_sequenced_output=True)


Do you really need the dataset to be sequenced? I think all you need is implicit ordering (batch indices) so you can sequence the batches at any point down the pipeline (by your asof join node).

Do you really need the dataset to be sequenced? I think all you need is implicit ordering (batch indices) so you can sequence the batches at any point down the pipeline (by your asof join node).

True. But I don't think require_sequenced_output actually means batches should be produced by this node in order, but batches need to be sequenced (ie. have batch.index assigned). Current implementation does produce batches in order as a side effect of indexing.

In context of your GH-26818
Maybe the option require_sequenced_output should just be renamed to implicit_ordering?
What would be the purpose of require_sequenced_output!=implicit_ordering?

P.S. I think that the idea of "Dataset has implicit ordering" is not handled in arrow. There is no way to store ordering information in dataset and order is subjected to alphabetical ordering based on fragment filenames(which with default filename template is incorrect part_1 -> part_10 -> part2 -> part_3 etc...).

EnricoMi · 2024-11-01T20:36:18Z

@gitmodimo I have reworked some of this PR's changes in #44616 and added some of my code from #44470.

Please have a look.

Modify asof_join_node test to provoke Issue 41706

59da793

mroz45 requested a review from westonpace as a code owner September 12, 2024 12:54

github-actions bot added Component: C++ awaiting review Awaiting review labels Sep 12, 2024

mroz45 changed the title ~~Modify asof_join_node test to provoke Issue 41706~~ GH-41706: [C++][Acero] Modify asof_join_node test to provoke Issue 41706 Sep 16, 2024

mroz45 changed the title ~~GH-41706: [C++][Acero] Modify asof_join_node test to provoke Issue 41706~~ GH-41706: [C++][Acero] Modify asof_join_node test to provoke issue Sep 16, 2024

Fix issue 41706 using sequencer

c8047bb

mroz45 force-pushed the 17.0.0 branch from 43e8ea0 to c8047bb Compare September 17, 2024 12:10

Sequencing output of a scan node

2d63cef

westonpace requested changes Oct 6, 2024

View reviewed changes

github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 6, 2024

Perform requested changes

f2fc33b

github-actions bot added Component: Python awaiting review Awaiting review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Oct 16, 2024

mroz45 requested a review from wgtmac as a code owner October 16, 2024 12:59

github-actions bot added the Component: Parquet label Oct 16, 2024

Fixed formatting

d5972f2

mroz45 force-pushed the 17.0.0 branch from 6117bd8 to d5972f2 Compare October 16, 2024 13:23

github-actions bot removed the Component: Parquet label Oct 16, 2024

mroz45 requested a review from westonpace October 16, 2024 13:25

westonpace approved these changes Oct 29, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Oct 29, 2024

gitmodimo mentioned this pull request Oct 29, 2024

GH-26818: [C++][Python] Preserve order when writing dataset multi-threaded #44470

Open

westonpace changed the title ~~GH-41706: [C++][Acero] Modify asof_join_node test to provoke issue~~ GH-41706: [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input Oct 29, 2024

westonpace merged commit 2a0f06c into apache:main Oct 29, 2024
41 checks passed

westonpace removed the awaiting merge Awaiting merge label Oct 29, 2024

westonpace mentioned this pull request Oct 29, 2024

[Python] join_asof out-of-order error for big sorted tables #41706

Closed

EnricoMi reviewed Oct 31, 2024

View reviewed changes

github-actions bot added the awaiting committer review Awaiting committer review label Oct 31, 2024

EnricoMi reviewed Oct 31, 2024

View reviewed changes

EnricoMi mentioned this pull request Nov 1, 2024

GH-44629: [C++][Acero] Use implicit_ordering for asof_join rather than require_sequenced_output #44616

Open

EnricoMi mentioned this pull request Nov 3, 2024

[C++][Acero] Use implicit_ordering for asof_join rather than require_sequenced_output #44629

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-41706: [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input #44083

GH-41706: [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input #44083

mroz45 commented Sep 12, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Sep 12, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

westonpace left a comment

westonpace Oct 6, 2024

westonpace Oct 6, 2024

westonpace commented Oct 6, 2024

gitmodimo commented Oct 9, 2024

gitmodimo commented Oct 16, 2024

westonpace left a comment

gitmodimo commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

westonpace commented Oct 29, 2024

gitmodimo commented Oct 29, 2024

conbench-apache-arrow bot commented Oct 30, 2024

EnricoMi Oct 31, 2024

gitmodimo Oct 31, 2024

EnricoMi Oct 31, 2024

EnricoMi Oct 31, 2024

gitmodimo Oct 31, 2024

EnricoMi Nov 1, 2024

gitmodimo Nov 1, 2024

EnricoMi Oct 31, 2024

gitmodimo Oct 31, 2024

EnricoMi commented Nov 1, 2024 •

edited

Loading

		bool require_sequenced_output = false)
		bool require_sequenced_output = true)

		if(::arrow::compute::kUnsequencedIndex == batch.index)
		return Status::Invalid("AsofJoin requires sequenced input");

	plan_->query_context()->ScheduleTask(
	[this, morsel_length, use_legacy_batching, initial_batch_index, morsel,
	has_ordering = !ordering_.is_unordered()]() {
	int64_t offset = 0;
	int batch_index = initial_batch_index;
	do {
	int64_t batch_size =
	std::min<int64_t>(morsel_length - offset, ExecPlan::kMaxBatchSize);
	// In order for the legacy batching model to work we must
	// not slice batches from the source
	if (use_legacy_batching) {
	batch_size = morsel_length;
	}
	ExecBatch batch = morsel.Slice(offset, batch_size);
	UnalignedBufferHandling unaligned_buffer_handling =
	plan_->query_context()->options().unaligned_buffer_handling.value_or(
	GetDefaultUnalignedBufferHandling());
	ARROW_RETURN_NOT_OK(
	HandleUnalignedBuffers(&batch, unaligned_buffer_handling));
	if (has_ordering) {
	batch.index = batch_index;
	}
	offset += batch_size;
	batch_index++;
	ARROW_RETURN_NOT_OK(output_->InputReceived(this, std::move(batch)));
	} while (offset < morsel.length);
	return Status::OK();
	},
	"SourceNode::ProcessMorsel");

GH-41706: [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input #44083

GH-41706: [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input #44083

Conversation

mroz45 commented Sep 12, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

github-actions bot commented Sep 12, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonpace commented Oct 6, 2024

gitmodimo commented Oct 9, 2024

gitmodimo commented Oct 16, 2024

westonpace left a comment

Choose a reason for hiding this comment

gitmodimo commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

westonpace commented Oct 29, 2024

gitmodimo commented Oct 29, 2024

conbench-apache-arrow bot commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EnricoMi commented Nov 1, 2024 • edited Loading

mroz45 commented Sep 12, 2024 •

edited by github-actions bot

Loading

EnricoMi commented Nov 1, 2024 •

edited

Loading