Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in the FDW execution order #65

Open
2 tasks done
kysshsy opened this issue Aug 10, 2024 · 2 comments
Open
2 tasks done

Inconsistency in the FDW execution order #65

kysshsy opened this issue Aug 10, 2024 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers priority-high High priority issue

Comments

@kysshsy
Copy link
Contributor

kysshsy commented Aug 10, 2024

data.csv

What happens?

using FDW on join tables. As you can see, the result is different with the same query.

pg_analytics=# set enable_nestloop = on;
SET
pg_analytics=# select count(*) from t1, t2 left join t3 on true  where t1.value = t2.value ;
 count
-------
     6
(1 row)

pg_analytics=# explain select count(*) from t1, t2 left join t3 on true  where t1.value = t2.value ;
                               QUERY PLAN
------------------------------------------------------------------------
 Aggregate  (cost=69.39..69.40 rows=1 width=8)
   ->  Nested Loop Left Join  (cost=0.00..63.01 rows=2550 width=0)
         ->  Nested Loop  (cost=0.00..2.01 rows=1 width=0)
               Join Filter: (t1.value = t2.value)
               ->  Foreign Scan on t1  (cost=0.00..1.00 rows=1 width=0)
                     DuckDB Scan: SELECT value FROM public.t1
               ->  Foreign Scan on t2  (cost=0.00..1.00 rows=1 width=0)
                     DuckDB Scan: SELECT value FROM public.t2
         ->  Seq Scan on t3  (cost=0.00..35.50 rows=2550 width=0)
(9 rows)
pg_analytics=# set enable_nestloop = off;
SET
pg_analytics=# explain select count(*) from t1, t2 left join t3 on true  where t1.value = t2.value ;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Aggregate  (cost=10000000069.40..10000000069.41 rows=1 width=8)
   ->  Nested Loop Left Join  (cost=10000000001.01..10000000063.03 rows=2550 width=0)
         ->  Hash Join  (cost=1.01..2.03 rows=1 width=0)
               Hash Cond: (t1.value = t2.value)
               ->  Foreign Scan on t1  (cost=0.00..1.00 rows=1 width=0)
                     DuckDB Scan: SELECT value FROM public.t1
               ->  Hash  (cost=1.00..1.00 rows=1 width=0)
                     ->  Foreign Scan on t2  (cost=0.00..1.00 rows=1 width=0)
                           DuckDB Scan: SELECT value FROM public.t2
         ->  Seq Scan on t3  (cost=0.00..35.50 rows=2550 width=0)
(10 rows)

pg_analytics=# select count(*) from t1, t2 left join t3 on true  where t1.value = t2.value ;
 count
-------
  2077
(1 row)

To Reproduce

CREATE FOREIGN TABLE t1()
SERVER csv_server
OPTIONS (files '/home/kyss/labs/paradedb/data.csv');

CREATE FOREIGN TABLE t2()
SERVER csv_server
OPTIONS (files '/home/kyss/labs/paradedb/data.csv');

CREATE TABLE t3(a int);

see above

OS:

x86

ParadeDB Version:

0.9.0

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB Docker Image

Full Name:

kysshsy

Affiliation:

NA

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

  • Yes, I have
@kysshsy kysshsy added the bug Something isn't working label Aug 10, 2024
@kysshsy
Copy link
Contributor Author

kysshsy commented Aug 10, 2024

I think this is because now we have a global variable Arrow. This indicates the execution of a FDW scan fetch all tuples at a time. But this doesn't seem to be the case for Postgres. Different FDW scans might alternate execution.
(Please ensure that the join of foreign tables uses a nested loop join, as this will help guarantee alternate execution.)
Maybe we should change the global variable pattern of Statement and Arrow too. Store them in FDW state.

@philippemnoel philippemnoel changed the title FDW execution order Inconsistency in the FDW execution order Aug 23, 2024
@philippemnoel philippemnoel added good first issue Good for newcomers priority-high High priority issue labels Aug 23, 2024
@philippemnoel
Copy link
Collaborator

I think this is because now we have a global variable Arrow. This indicates the execution of a FDW scan fetch all tuples at a time. But this doesn't seem to be the case for Postgres. Different FDW scans might alternate execution. (Please ensure that the join of foreign tables uses a nested loop join, as this will help guarantee alternate execution.) Maybe we should change the global variable pattern of Statement and Arrow too. Store them in FDW state.

Hmmm. This is a good idea. We've had another user report this issue. I've bumped up the priority level

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers priority-high High priority issue
Projects
None yet
Development

No branches or pull requests

2 participants