Query 18 spill to disk and take a lot of space #90

djouallah · 2023-03-23T12:50:47Z

I am trying to run TPCH-SF100 on my laptop, the good news, I can build the hyper file easily, the test run very well except Query 18 which timeout as I don't have enough empty space on my laptop , I saw some people sort lineitem but I can't do it on my laptop as it time out too

defining FK and PK did not help

SELECT
    --Query18
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    SUM(l_quantity)
FROM
    customer,
    orders,
    lineitem
WHERE
    o_orderkey IN (
        SELECT
            l_orderkey
        FROM
            lineitem
        GROUP BY
            l_orderkey
        HAVING
            SUM(l_quantity) > 300
    )
    AND c_custkey = o_custkey
    AND o_orderkey = l_orderkey
GROUP BY
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
ORDER BY
    o_totalprice DESC,
    o_orderdate
LIMIT
    100;

The text was updated successfully, but these errors were encountered:

vogelsgesang · 2023-03-27T16:08:46Z

Thank you for this report! And thanks for benchmarking Hyper! Benchmarks results are very valuable for us to understand where Hyper should still be improved. In particular, if those benchmarks are done by people outside the Hyper team, that also helps to confirm the reproduceability of Hyper's performance 🙂

In this particular case, it's probably good that Hyper isn't able to successfully finish the query - finishing it with spooling would be horribly slow. In general, I would recommend to just disable spooling altogether... I prefer a quick error message over a very long-running query.

The problem for this query is that the subquery

SELECT
    l_orderkey
FROM
    lineitem
GROUP BY
    l_orderkey
HAVING
    SUM(l_quantity) > 300

runs out of memory. The GROUP BY l_orderkey needs to keep too much state in memory.

We will have to see when we get to this. Fixing TPC-H queries is currently not our primary business priority, but as performance enthusiasts, we are of course personally interested in unblocking TPC-H Q18

djouallah changed the title ~~Hyper struggle with Query 18~~ Query 18 spill to disk and take a lot of space Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query 18 spill to disk and take a lot of space #90

Query 18 spill to disk and take a lot of space #90

djouallah commented Mar 23, 2023 •

edited

Loading

vogelsgesang commented Mar 27, 2023

Query 18 spill to disk and take a lot of space #90

Query 18 spill to disk and take a lot of space #90

Comments

djouallah commented Mar 23, 2023 • edited Loading

vogelsgesang commented Mar 27, 2023

djouallah commented Mar 23, 2023 •

edited

Loading