Performance regression: UNION in v19 #15466

systay · 2024-03-13T08:03:53Z

In a recent change to the UNION engine primitive, we added logic to make sure the output types of the union columns are correct.

Unfortunately, this made our UNION code much, much slower than before.

The end-to-end test TestUnionAll, if we run it many times, shows the degradation.

On v18 those 10000 iterations ran in a bit over 6 seconds:

--- PASS: TestUnionAll (6.72s)
    --- PASS: TestUnionAll/olap (6.70s)

On v19 though with the same 10000 iterations:

--- PASS: TestUnionAll (210.99s)
    --- PASS: TestUnionAll/olap (210.97s)

The text was updated successfully, but these errors were encountered:

systay · 2024-03-13T08:17:33Z

so, checking out the changes, it looks like we're waiting around to get all the input types before we figure out how to coerce the values properly. but really, we only need to do this if we're not sure about the input types from the get-go. i'm thinking we can speed things up a bit:

if we've already figured out the types during the planning stage, we shouldn't have to wait on any inputs and just apply coercion where it's actually needed. right now, we're doing the type calculus and just coercing everything across the board, even though the type is already correct.
once we run the query for the first time, we've got all the type info we need. so, for any subsequent runs using the same plan, we shouldn't have to redo the type calculations.

for a fix, how about:

pull the type calculations and coercion logic out of the Concatenate primitive, so we can be more selective about when to use it.
once we have the full type info, tailor a plan that only coerces what's necessary, without holding up the works waiting for input fields.
if we're missing type info initially, tweak the plan after the first run, based on the types we see. ideally, after this first run, the plan should be as efficient as the one where we had all the type info upfront. right now, there's no interaction between the planning and execution phases, but we could really use some back-and-forth here.

harshit-gangal · 2024-06-26T06:42:42Z

TL;DR: There is no performance regression.

Steps Taken to Understand the Issue:

Checked the commit mentioned in the issue, moved to the previous commit, and started making changes. There was no performance regression with the new changes, but this wasn't sufficient as the code had evolved.
Examined the new PR with a simplified, nearly lockless implementation of concatenate. It still showed performance regression. Tried further optimizations but couldn't improve performance. Spent time analyzing the profiler to detect any anomalies, but the flame graphs revealed nothing unusual.
Switched to the main code and made concatenate a passthrough, but there was still a performance regression. At that point, I realized the problem stemmed from another commit, unrelated to the concatenate engine.
Performed git bisect and found the problematic commit: 74278f1. The changes affected how we test the output of statements by comparing fields. These were changes in the test comparison code, which is why they weren't visible in the flame graphs.
Made changes to the test code and ran the tests on release-19.0 and release-20.0, and voila, no performance regression.

Conclusion:
Going forward, we should validate performance regression with direct vtgate execution and not through the mcmp library.

systay added Component: Query Serving Type: Performance Type: Regression labels Mar 13, 2024

frouioui mentioned this issue Mar 13, 2024

Re-plan UNION plans to get type information #15476

Closed

5 tasks

GuptaManan100 assigned systay Mar 20, 2024

systay mentioned this issue Mar 21, 2024

Performance improvements for UNION #15497

Closed

5 tasks

systay removed their assignment Jun 19, 2024

harshit-gangal self-assigned this Jun 24, 2024

harshit-gangal closed this as completed Jun 26, 2024

harshit-gangal reopened this Jun 26, 2024

harshit-gangal closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression: UNION in v19 #15466

Performance regression: UNION in v19 #15466

systay commented Mar 13, 2024

systay commented Mar 13, 2024 •

edited

Loading

harshit-gangal commented Jun 26, 2024

Performance regression: UNION in v19 #15466

Performance regression: UNION in v19 #15466

Comments

systay commented Mar 13, 2024

systay commented Mar 13, 2024 • edited Loading

harshit-gangal commented Jun 26, 2024

systay commented Mar 13, 2024 •

edited

Loading