Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44084: [C++] Improve merge step in chunked sorting #44217

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 24, 2024

Rationale for this change

When merge-sorting the chunks of a chunked array or table, we would currently repeatedly resolve the chunk indices for each individual value lookup. This requires O(n*log k) chunk resolutions with n being the chunked array or table length, and k the number of chunks.

Instead, this PR translates the logical indices to physical all at once, without even requiring expensive chunk resolution as the logical indices are initially chunk-partitioned.

This change yields significant speedups on chunked array and table sorting:

                                           benchmark          baseline         contender  change %                                                                                                                                                                                                                                       counters
      ChunkedArraySortIndicesInt64Narrow/1048576/100   345.419 MiB/sec   628.334 MiB/sec    81.905                               {'family_index': 0, 'per_family_instance_index': 6, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/1048576/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 242, 'null_percent': 1.0}
          TableSortIndicesInt64Narrow/1048576/0/1/32 25.997M items/sec 44.550M items/sec    71.366   {'family_index': 3, 'per_family_instance_index': 11, 'run_name': 'TableSortIndicesInt64Narrow/1048576/0/1/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17, 'chunks': 32.0, 'columns': 1.0, 'null_percent': 0.0}
        ChunkedArraySortIndicesInt64Wide/32768/10000    91.182 MiB/sec   153.756 MiB/sec    68.625                               {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ChunkedArraySortIndicesInt64Wide/32768/10000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2067, 'null_percent': 0.01}
           ChunkedArraySortIndicesInt64Wide/32768/10    96.536 MiB/sec   161.648 MiB/sec    67.449                                  {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'ChunkedArraySortIndicesInt64Wide/32768/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2238, 'null_percent': 10.0}
        TableSortIndicesInt64Narrow/1048576/100/1/32 24.290M items/sec 40.513M items/sec    66.791  {'family_index': 3, 'per_family_instance_index': 9, 'run_name': 'TableSortIndicesInt64Narrow/1048576/100/1/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 16, 'chunks': 32.0, 'columns': 1.0, 'null_percent': 1.0}
          ChunkedArraySortIndicesInt64Wide/32768/100    90.030 MiB/sec   149.633 MiB/sec    66.203                                  {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'ChunkedArraySortIndicesInt64Wide/32768/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2017, 'null_percent': 1.0}
            ChunkedArraySortIndicesInt64Wide/32768/0    91.982 MiB/sec   152.840 MiB/sec    66.163                                    {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'ChunkedArraySortIndicesInt64Wide/32768/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2115, 'null_percent': 0.0}
      ChunkedArraySortIndicesInt64Narrow/8388608/100   240.335 MiB/sec   387.423 MiB/sec    61.201                                {'family_index': 0, 'per_family_instance_index': 7, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/8388608/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21, 'null_percent': 1.0}
            ChunkedArraySortIndicesInt64Wide/32768/2   172.376 MiB/sec   274.133 MiB/sec    59.032                                   {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'ChunkedArraySortIndicesInt64Wide/32768/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3770, 'null_percent': 50.0}
            TableSortIndicesInt64Wide/1048576/4/1/32  7.407M items/sec 11.621M items/sec    56.904     {'family_index': 4, 'per_family_instance_index': 10, 'run_name': 'TableSortIndicesInt64Wide/1048576/4/1/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5, 'chunks': 32.0, 'columns': 1.0, 'null_percent': 25.0}
          TableSortIndicesInt64Wide/1048576/100/1/32  5.788M items/sec  9.062M items/sec    56.565     {'family_index': 4, 'per_family_instance_index': 9, 'run_name': 'TableSortIndicesInt64Wide/1048576/100/1/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 1.0, 'null_percent': 1.0}
            TableSortIndicesInt64Wide/1048576/0/1/32  5.785M items/sec  9.049M items/sec    56.409      {'family_index': 4, 'per_family_instance_index': 11, 'run_name': 'TableSortIndicesInt64Wide/1048576/0/1/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 1.0, 'null_percent': 0.0}
          ChunkedArraySortIndicesInt64Narrow/32768/2   194.743 MiB/sec   291.432 MiB/sec    49.649                                 {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/32768/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4340, 'null_percent': 50.0}
          TableSortIndicesInt64Narrow/1048576/4/1/32 25.686M items/sec 38.087M items/sec    48.279  {'family_index': 3, 'per_family_instance_index': 10, 'run_name': 'TableSortIndicesInt64Narrow/1048576/4/1/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17, 'chunks': 32.0, 'columns': 1.0, 'null_percent': 25.0}
            TableSortIndicesInt64Wide/1048576/0/8/32  5.766M items/sec  8.374M items/sec    45.240       {'family_index': 4, 'per_family_instance_index': 5, 'run_name': 'TableSortIndicesInt64Wide/1048576/0/8/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 8.0, 'null_percent': 0.0}
           TableSortIndicesInt64Wide/1048576/0/16/32  5.752M items/sec  8.352M items/sec    45.202     {'family_index': 4, 'per_family_instance_index': 2, 'run_name': 'TableSortIndicesInt64Wide/1048576/0/16/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 16.0, 'null_percent': 0.0}
      ChunkedArraySortIndicesInt64Narrow/32768/10000   121.253 MiB/sec   175.286 MiB/sec    44.562                             {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/32768/10000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2673, 'null_percent': 0.01}
          TableSortIndicesInt64Wide/1048576/100/2/32  5.549M items/sec  7.984M items/sec    43.876     {'family_index': 4, 'per_family_instance_index': 6, 'run_name': 'TableSortIndicesInt64Wide/1048576/100/2/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 2.0, 'null_percent': 1.0}
        ChunkedArraySortIndicesInt64Wide/1048576/100    69.599 MiB/sec    99.666 MiB/sec    43.200                                  {'family_index': 1, 'per_family_instance_index': 6, 'run_name': 'ChunkedArraySortIndicesInt64Wide/1048576/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 49, 'null_percent': 1.0}
           TableSortIndicesInt64Narrow/1048576/0/1/4 55.940M items/sec 79.984M items/sec    42.982     {'family_index': 3, 'per_family_instance_index': 23, 'run_name': 'TableSortIndicesInt64Narrow/1048576/0/1/4', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 37, 'chunks': 4.0, 'columns': 1.0, 'null_percent': 0.0}
         TableSortIndicesInt64Wide/1048576/100/16/32  5.554M items/sec  7.909M items/sec    42.417   {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'TableSortIndicesInt64Wide/1048576/100/16/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 16.0, 'null_percent': 1.0}
         ChunkedArraySortIndicesInt64Narrow/32768/10   127.758 MiB/sec   181.407 MiB/sec    41.992                                {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/32768/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2856, 'null_percent': 10.0}
          TableSortIndicesInt64Wide/1048576/100/8/32  5.572M items/sec  7.775M items/sec    39.548     {'family_index': 4, 'per_family_instance_index': 3, 'run_name': 'TableSortIndicesInt64Wide/1048576/100/8/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 8.0, 'null_percent': 1.0}
        ChunkedArraySortIndicesInt64Narrow/32768/100   119.600 MiB/sec   166.454 MiB/sec    39.176                                {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/32768/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2667, 'null_percent': 1.0}
            TableSortIndicesInt64Wide/1048576/0/2/32  5.781M items/sec  8.016M items/sec    38.669       {'family_index': 4, 'per_family_instance_index': 8, 'run_name': 'TableSortIndicesInt64Wide/1048576/0/2/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4, 'chunks': 32.0, 'columns': 2.0, 'null_percent': 0.0}
         TableSortIndicesInt64Narrow/1048576/100/1/4 52.252M items/sec 72.193M items/sec    38.162   {'family_index': 3, 'per_family_instance_index': 21, 'run_name': 'TableSortIndicesInt64Narrow/1048576/100/1/4', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 35, 'chunks': 4.0, 'columns': 1.0, 'null_percent': 1.0}
          ChunkedArraySortIndicesInt64Narrow/32768/0   121.868 MiB/sec   168.364 MiB/sec    38.152                                  {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/32768/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2691, 'null_percent': 0.0}
            TableSortIndicesInt64Wide/1048576/4/2/32  5.017M items/sec  6.720M items/sec    33.934      {'family_index': 4, 'per_family_instance_index': 7, 'run_name': 'TableSortIndicesInt64Wide/1048576/4/2/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3, 'chunks': 32.0, 'columns': 2.0, 'null_percent': 25.0}
        ChunkedArraySortIndicesInt64Wide/8388608/100    54.785 MiB/sec    72.642 MiB/sec    32.593                                   {'family_index': 1, 'per_family_instance_index': 7, 'run_name': 'ChunkedArraySortIndicesInt64Wide/8388608/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5, 'null_percent': 1.0}
            TableSortIndicesInt64Wide/1048576/4/8/32  4.222M items/sec  5.483M items/sec    29.861      {'family_index': 4, 'per_family_instance_index': 4, 'run_name': 'TableSortIndicesInt64Wide/1048576/4/8/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3, 'chunks': 32.0, 'columns': 8.0, 'null_percent': 25.0}
              ChunkedArraySortIndicesString/32768/10   146.866 MiB/sec   190.314 MiB/sec    29.583                                     {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'ChunkedArraySortIndicesString/32768/10', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3494, 'null_percent': 10.0}
           TableSortIndicesInt64Wide/1048576/4/16/32  4.225M items/sec  5.433M items/sec    28.599    {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'TableSortIndicesInt64Wide/1048576/4/16/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3, 'chunks': 32.0, 'columns': 16.0, 'null_percent': 25.0}
       TableSortIndicesInt64Narrow/1048576/100/16/32  2.193M items/sec  2.711M items/sec    23.652 {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'TableSortIndicesInt64Narrow/1048576/100/16/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2, 'chunks': 32.0, 'columns': 16.0, 'null_percent': 1.0}
             ChunkedArraySortIndicesString/32768/100   156.401 MiB/sec   191.910 MiB/sec    22.704                                     {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'ChunkedArraySortIndicesString/32768/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3488, 'null_percent': 1.0}
           TableSortIndicesInt64Narrow/1048576/4/1/4 47.342M items/sec 58.062M items/sec    22.644    {'family_index': 3, 'per_family_instance_index': 22, 'run_name': 'TableSortIndicesInt64Narrow/1048576/4/1/4', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 32, 'chunks': 4.0, 'columns': 1.0, 'null_percent': 25.0}
               ChunkedArraySortIndicesString/32768/0   161.457 MiB/sec   195.782 MiB/sec    21.259                                       {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'ChunkedArraySortIndicesString/32768/0', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3644, 'null_percent': 0.0}
         TableSortIndicesInt64Narrow/1048576/4/16/32  1.915M items/sec  2.309M items/sec    20.561  {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'TableSortIndicesInt64Narrow/1048576/4/16/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1, 'chunks': 32.0, 'columns': 16.0, 'null_percent': 25.0}
         TableSortIndicesInt64Narrow/1048576/0/16/32  2.561M items/sec  3.079M items/sec    20.208   {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'TableSortIndicesInt64Narrow/1048576/0/16/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2, 'chunks': 32.0, 'columns': 16.0, 'null_percent': 0.0}
           ChunkedArraySortIndicesString/32768/10000   157.786 MiB/sec   189.412 MiB/sec    20.043                                  {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'ChunkedArraySortIndicesString/32768/10000', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3539, 'null_percent': 0.01}
               ChunkedArraySortIndicesString/32768/2   139.241 MiB/sec   164.172 MiB/sec    17.904                                      {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'ChunkedArraySortIndicesString/32768/2', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3155, 'null_percent': 50.0}
          TableSortIndicesInt64Narrow/1048576/0/8/32  2.595M items/sec  3.038M items/sec    17.081     {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'TableSortIndicesInt64Narrow/1048576/0/8/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2, 'chunks': 32.0, 'columns': 8.0, 'null_percent': 0.0}
          TableSortIndicesInt64Narrow/1048576/4/8/32  1.999M items/sec  2.298M items/sec    14.936    {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'TableSortIndicesInt64Narrow/1048576/4/8/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1, 'chunks': 32.0, 'columns': 8.0, 'null_percent': 25.0}
           ChunkedArraySortIndicesString/8388608/100    81.026 MiB/sec    93.120 MiB/sec    14.926                                      {'family_index': 2, 'per_family_instance_index': 7, 'run_name': 'ChunkedArraySortIndicesString/8388608/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7, 'null_percent': 1.0}
        TableSortIndicesInt64Narrow/1048576/100/8/32  2.382M items/sec  2.719M items/sec    14.168   {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'TableSortIndicesInt64Narrow/1048576/100/8/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2, 'chunks': 32.0, 'columns': 8.0, 'null_percent': 1.0}
           ChunkedArraySortIndicesString/1048576/100   107.722 MiB/sec   122.229 MiB/sec    13.467                                     {'family_index': 2, 'per_family_instance_index': 6, 'run_name': 'ChunkedArraySortIndicesString/1048576/100', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 77, 'null_percent': 1.0}
        TableSortIndicesInt64Narrow/1048576/100/2/32  4.019M items/sec  4.477M items/sec    11.383   {'family_index': 3, 'per_family_instance_index': 6, 'run_name': 'TableSortIndicesInt64Narrow/1048576/100/2/32', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3, 'chunks': 32.0, 'columns': 2.0, 'null_percent': 1.0}
             TableSortIndicesInt64Wide/1048576/4/1/4 11.595M items/sec 12.791M items/sec    10.314       {'family_index': 4, 'per_family_instance_index': 22, 'run_name': 'TableSortIndicesInt64Wide/1048576/4/1/4', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 8, 'chunks': 4.0, 'columns': 1.0, 'null_percent': 25.0}
             TableSortIndicesInt64Wide/1048576/0/1/4  9.231M items/sec 10.181M items/sec    10.294        {'family_index': 4, 'per_family_instance_index': 23, 'run_name': 'TableSortIndicesInt64Wide/1048576/0/1/4', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6, 'chunks': 4.0, 'columns': 1.0, 'null_percent': 0.0}

However, performance also regresses when the input is all-nulls (which is probably rare):

                                       benchmark           baseline          contender  change %                                                                                                                                                                                                                                      counters
           ChunkedArraySortIndicesString/32768/1      5.636 GiB/sec      4.336 GiB/sec   -23.068                                  {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'ChunkedArraySortIndicesString/32768/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 127778, 'null_percent': 100.0}
      ChunkedArraySortIndicesInt64Narrow/32768/1      3.963 GiB/sec      2.852 GiB/sec   -28.025                              {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'ChunkedArraySortIndicesInt64Narrow/32768/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 91209, 'null_percent': 100.0}
        ChunkedArraySortIndicesInt64Wide/32768/1      4.038 GiB/sec      2.869 GiB/sec   -28.954                                {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'ChunkedArraySortIndicesInt64Wide/32768/1', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 94090, 'null_percent': 100.0}

Are these changes tested?

Yes, by existing tests.

Are there any user-facing changes?

No.

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@pitrou
Copy link
Member Author

pitrou commented Sep 24, 2024

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Sep 24, 2024

Benchmark runs are scheduled for commit a24e70a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@pitrou
Copy link
Member Author

pitrou commented Sep 24, 2024

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Sep 24, 2024

Benchmark runs are scheduled for commit 45566ce. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

Copy link

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit a24e70a.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Copy link

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit 45566ce.

There were 21 benchmark results indicating a performance regression:

The full Conbench report has more details.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

@ursabot benchmark help

@ursabot
Copy link

ursabot commented Sep 25, 2024

Supported benchmark command examples:

@ursabot benchmark help

To run all benchmarks:
@ursabot please benchmark

To filter benchmarks by language:
@ursabot please benchmark lang=Python
@ursabot please benchmark lang=C++
@ursabot please benchmark lang=R
@ursabot please benchmark lang=Java
@ursabot please benchmark lang=JavaScript

To filter Python and R benchmarks by name:
@ursabot please benchmark name=file-write
@ursabot please benchmark name=file-write lang=Python
@ursabot please benchmark name=file-.*

To filter C++ benchmarks by archery --suite-filter and --benchmark-filter:
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-vector-selection-benchmark --benchmark-filter=TakeStringRandomIndicesWithNulls/262144/2

For other command=cpp-micro options, please see https://github.com/voltrondata-labs/benchmarks/blob/main/benchmarks/cpp_micro_benchmarks.py

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

@ursabot please benchmark command=cpp-micro --suite-filter=vector-sort

@ursabot
Copy link

ursabot commented Sep 25, 2024

Benchmark runs are scheduled for commit df0f691. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Sep 25, 2024

Commit df0f691 already has scheduled benchmark runs.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Sep 25, 2024

Benchmark runs are scheduled for commit 275871b. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@pitrou pitrou changed the title EXP: GH-44084: [C++] Improve merge step in chunked sorting GH-44084: [C++] Improve merge step in chunked sorting Sep 25, 2024
Copy link

⚠️ GitHub issue #44084 has been automatically assigned in GitHub to PR creator.

@pitrou pitrou marked this pull request as ready for review September 25, 2024 09:40
@pitrou pitrou marked this pull request as draft September 25, 2024 11:08
@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

Set back to draft because some things can be further improved.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Sep 25, 2024

Benchmark runs are scheduled for commit f69b3b8. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

Copy link

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit f69b3b8.

There were 43 benchmark results indicating a performance regression:

The full Conbench report has more details.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 18, 2024
@pitrou
Copy link
Member Author

pitrou commented Nov 18, 2024

@felipecrv Would you like to give this another look (assuming CI passes, which it should :-))?

@zanmato1984
Copy link
Contributor

Sorry I will be fully occupied until the end of this week. I'll help review next week.

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions and nits.

@pitrou
Copy link
Member Author

pitrou commented Nov 26, 2024

@github-actions crossbow submit -g cpp

Copy link

Revision: 4f2fff4

Submitted crossbow builds: ursacomputing/crossbow @ actions-fa64807be3

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Thanks for the improvement!

@pitrou pitrou merged commit d5cda4a into apache:main Nov 26, 2024
39 of 40 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Nov 26, 2024
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit d5cda4a.

There were 132 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

@assignUser
Copy link
Member

It seems like this change causes issues on gcc 8 https://github.com/ursacomputing/crossbow/actions/runs/12081500439/job/33690725970#step:7:2058 probably the change from std::vector to span in chunkresolver?

@pitrou pitrou deleted the gh44084-chunked-sort branch December 2, 2024 09:23
@pitrou
Copy link
Member Author

pitrou commented Dec 2, 2024

@assignUser Thanks for the heads up, I'll take a look.

@pitrou
Copy link
Member Author

pitrou commented Dec 2, 2024

@assignUser See #44898 and #44899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants