Skip to content

Commit

Permalink
Merge pull request #3331 from EnterpriseDB/content/docs/mongo_data_ad…
Browse files Browse the repository at this point in the history
…apter/new_release

FDW - Release branch
  • Loading branch information
drothery-edb authored Jan 6, 2023
2 parents 669f9a5 + 4ecca37 commit f89ac24
Show file tree
Hide file tree
Showing 21 changed files with 631 additions and 22 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ This table lists the latest Hadoop Foreign Data Wrapper versions and their suppo

| Hadoop Foreign Data Wrapper | EPAS 14 | EPAS 13 | EPAS 12 | EPAS 11 |
| --------------------------- | ------- | ------- | ------- | ------- |
| 2.3.0 | Y | Y | Y | Y |
| 2.2.0 | Y | Y | Y | Y |
| 2.1.0 | Y | Y | Y | Y |
| 2.0.8 | N | Y | Y | Y |
| 2.0.7 | N | Y | Y | N |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,30 @@ Hadoop Foreign Data Wrapper supports column pushdown. As a result, the query bri

## Join pushdown

Hadoop Foreign Data Wrapper supports join pushdown. It pushes the joins between the foreign tables of the same remote Hive or Spark server to that remote Hive or Spark server, enhancing the performance.
Hadoop Foreign Data Wrapper supports join pushdown. It pushes the joins between the foreign tables of the same remote Hive or Spark server to the remote Hive or Spark server, enhancing the performance.

For an example, see [Example: Join pushdown](10a_example_join_pushdown).
From version 2.3.0 and later, you can enable the join pushdown at session and query level using the `enable_join_pushdown` GUC variable.

For more information, see [Example: Join pushdown](10a_example_join_pushdown).

## Aggregate pushdown

Hadoop Foreign Data Wrapper supports aggregate pushdown. It pushes the aggregates to the remote Hive or Spark server instead of fetching all of the rows and aggregating them locally. This gives a very good performance boost for the cases where aggregates can be pushed down. The push-down is currently limited to aggregate functions min, max, sum, avg, and count, to avoid pushing down the functions that are not present on the Hadoop server. Also, aggregate filters and orders are not pushed down.
Hadoop Foreign Data Wrapper supports aggregate pushdown. It pushes the aggregates to the remote Hive or Spark server instead of fetching all of the rows and aggregating them locally. This gives a very good performance boost for the cases where aggregates can be pushed down. The pushdown is currently limited to aggregate functions min, max, sum, avg, and count, to avoid pushing down the functions that are not present on the Hadoop server. Also, aggregate filters and orders are not pushed down.

For more information, see [Example: Aggregate pushdown](10b_example_aggregate_pushdown).

## ORDER BY pushdown

Hadoop Foreign Data Wrapper supports order by pushdown. If possible, push the `ORDER BY` clause to the remote server. This approach provides the ordered result set from the foreign server, which can help to enable an efficient merge join.

For more information, see [Example: ORDER BY pushdown](10c_example_order_by_pushdown)

## LIMIT OFFSET pushdown

Hadoop Foreign Data Wrapper supports limit offset pushdown. Wherever possible, perform LIMIT and OFFSET operations on the remote server. This reduces network traffic between local Postgres and remote HDFS/Hive servers. ALL/NULL options aren't supported on the Hive server, which means they are not pushed down. Also, OFFSET without LIMIT isn't supported on the remote server. Queries having that construct are not pushed.

For more information, see [Example: LIMIT OFFSET pushdown](10d_example_limit_offset_pushdown)

## Automated cleanup

Hadoop Foreign Data Wrappper allows the cleanup of foreign tables in a single operation using the `DROP EXTENSION` command. This feature is specifically useful when a foreign table is set for a temporary purpose. The syntax is:
Expand Down
106 changes: 101 additions & 5 deletions product_docs/docs/hadoop_data_adapter/2/10a_example_join_pushdown.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This example shows join pushdown between the foreign tables of the same remote H

Tables on HIVE/SPARK server:

```text
```sql
0: jdbc:hive2://localhost:10000> describe emp;
+-----------+------------+----------+--+
| col_name | data_type | comment |
Expand All @@ -30,23 +30,29 @@ Tables on HIVE/SPARK server:
| loc | string | NULL |
+-----------+------------+----------+--+
3 rows selected (0.067 seconds)
```

Tables on Postgres server:

```sql
-- load extension first time after install
CREATE EXTENSION hdfs_fdw;

-- create server object
CREATE SERVER hdfs_server FOREIGN DATA WRAPPER hdfs_fdw OPTIONS(host 'localhost', port '10000', client_type 'spark', auth_type 'LDAP');

-- create user mapping
CREATE USER MAPPING FOR public SERVER hdfs_server OPTIONS (username 'user1', password 'pwd123');

-- create foreign table
CREATE FOREIGN TABLE dept (
deptno INTEGER,
dname VARCHAR(14),
loc VARCHAR(13)
)
SERVER hdfs_server OPTIONS (dbname 'fdw_db', table_name 'dept');

-- create foreign table
CREATE FOREIGN TABLE emp (
empno INTEGER,
ename VARCHAR(10),
Expand All @@ -62,49 +68,62 @@ SERVER hdfs_server OPTIONS (dbname 'fdw_db', table_name 'emp');

Queries with join pushdown:

```text
```sql
--inner join
edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 INNER JOIN dept t2 ON ( t1.deptno = t2.deptno );
QUERY PLAN
__OUTPUT__
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=15.00..35.00 rows=5000 width=84)
Output: t1.ename, t2.dname
Relations: (fdw_db.emp t1) INNER JOIN (fdw_db.dept t2)
Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 INNER JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
(4 rows)
(4 rows)
```

```sql
--left join
edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 LEFT JOIN dept t2 ON ( t1.deptno = t2.deptno );
__OUTPUT__
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=15.00..35.00 rows=5000 width=84)
Output: t1.ename, t2.dname
Relations: (fdw_db.emp t1) LEFT JOIN (fdw_db.dept t2)
Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 LEFT JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
(4 rows)
```

```sql
--right join
edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 RIGHT JOIN dept t2 ON ( t1.deptno = t2.deptno );
__OUTPUT__
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=15.00..35.00 rows=5000 width=84)
Output: t1.ename, t2.dname
Relations: (fdw_db.dept t2) LEFT JOIN (fdw_db.emp t1)
Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`dept` r2 LEFT JOIN `fdw_db`.`emp` r1 ON (((r1.`deptno` = r2.`deptno`))))
(4 rows)
```

```sql
--full join
edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 FULL JOIN dept t2 ON ( t1.deptno = t2.deptno );
__OUTPUT__
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=15.00..35.00 rows=5000 width=84)
Output: t1.ename, t2.dname
Relations: (fdw_db.emp t1) FULL JOIN (fdw_db.dept t2)
Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 FULL JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
(4 rows)
```

```sql
--cross join
edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 CROSS JOIN dept t2;
__OUTPUT__
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=15.00..35.00 rows=1000000 width=84)
Expand All @@ -113,3 +132,80 @@ edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 CROSS JOIN dept t2;
Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 INNER JOIN `fdw_db`.`dept` r2 ON (TRUE))
(4 rows)
```

Enable/disable GUC for join pushdown queries at table level:

```sql
-- enable join pushdown at the table level
ALTER FOREIGN TABLE emp OPTIONS (SET enable_join_pushdown 'true');
EXPLAIN (VERBOSE, COSTS OFF)
SELECT e.empno, e.ename, d.dname
FROM emp e JOIN dept d ON (e.deptno = d.deptno)
ORDER BY e.empno;
__OUTPUT__
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: e.empno, e.ename, d.dname
Sort Key: e.empno
-> Foreign Scan
Output: e.empno, e.ename, d.dname
Relations: (fdw_db.emp e) INNER JOIN (fdw_db.dept d)
Remote SQL: SELECT r1.`empno`, r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 INNER JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
(7 rows)
```

```sql
--Disable the GUC enable_join_pushdown.
SET hdfs_fdw.enable_join_pushdown to false;
-- Pushdown shouldn't happen as enable_join_pushdown is false.
EXPLAIN (VERBOSE, COSTS OFF)
SELECT e.empno, e.ename, d.dname
FROM emp e JOIN dept d ON (e.deptno = d.deptno)
ORDER BY e.empno;
__OUTPUT__
QUERY PLAN
-------------------------------------------------------------------------------------------
Sort
Output: e.empno, e.ename, d.dname
Sort Key: e.empno
-> Nested Loop
Output: e.empno, e.ename, d.dname
Join Filter: (e.deptno = d.deptno)
-> Foreign Scan on public.emp e
Output: e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno
Remote SQL: SELECT `empno`, `ename`, `deptno` FROM `fdw_db`.`emp`
-> Materialize
Output: d.dname, d.deptno
-> Foreign Scan on public.dept d
Output: d.dname, d.deptno
Remote SQL: SELECT `deptno`, `dname` FROM `fdw_db`.`dept`
```

Enable/disable GUC for join pushdown queries at the session level:

```sql
SET hdfs_fdw.enable_join_pushdown to true;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT e.empno, e.ename, d.dname
FROM emp e JOIN dept d ON (e.deptno = d.deptno)
ORDER BY e.empno;
__OUTPUT__
QUERY PLAN
-------------------------------------------------------------------------------------------
Sort
Output: e.empno, e.ename, d.dname
Sort Key: e.empno
-> Nested Loop
Output: e.empno, e.ename, d.dname
Join Filter: (e.deptno = d.deptno)
-> Foreign Scan on public.emp e
Output: e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno
Remote SQL: SELECT `empno`, `ename`, `deptno` FROM `fdw_db`.`emp`
-> Materialize
Output: d.dname, d.deptno
-> Foreign Scan on public.dept d
Output: d.dname, d.deptno
Remote SQL: SELECT `deptno`, `dname` FROM `fdw_db`.`dept`
(14 rows)
```
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This example shows aggregate pushdown between the foreign tables of the same rem

Tables on HIVE/SPARK server:

```text
```sql
0: jdbc:hive2://localhost:10000> describe emp;
+-----------+------------+----------+--+
| col_name | data_type | comment |
Expand All @@ -30,7 +30,6 @@ Tables on HIVE/SPARK server:
| loc | string | NULL |
+-----------+------------+----------+--+
3 rows selected (0.067 seconds)
```

Tables on Postgres server:
Expand Down Expand Up @@ -75,7 +74,7 @@ SELECT deptno, COUNT(*),SUM(sal),MAX(sal),MIN(sal),AVG(sal) FROM emp
GROUP BY deptno
HAVING deptno IN (10,20)
ORDER BY deptno;

__OUTPUT__
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Example: ORDER BY pushdown "
---

This example shows ORDER BY pushdown between the foreign tables of the same remote HIVE/SPARK server as the remote HIVE/SPARK server:

Tables on HIVE/SPARK server:

```sql
0: jdbc:hive2://localhost:10000> describe emp;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| empno | int | NULL |
| ename | string | NULL |
| job | string | NULL |
| mgr | int | NULL |
| hiredate | date | NULL |
| sal | int | NULL |
| comm | int | NULL |
| deptno | int | NULL |
+-----------+------------+----------+--+
8 rows selected (0.747 seconds)
0: jdbc:hive2://localhost:10000> describe dept;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| deptno | int | NULL |
| dname | string | NULL |
| loc | string | NULL |
+-----------+------------+----------+--+
3 rows selected (0.067 seconds)
```

Tables on Postgres server:

```sql
-- load extension first time after install
CREATE EXTENSION hdfs_fdw;

-- create server object
CREATE SERVER hdfs_server FOREIGN DATA WRAPPER hdfs_fdw OPTIONS(host 'localhost', port '10000', client_type 'spark', auth_type 'LDAP');

-- create user mapping
CREATE USER MAPPING FOR public SERVER hdfs_server OPTIONS (username 'user1', password 'pwd123');

-- create foreign table
CREATE FOREIGN TABLE emp (
empno INTEGER,
ename VARCHAR(10),
job VARCHAR(9),
mgr INTEGER,
hiredate DATE,
sal INTEGER,
comm INTEGER,
deptno INTEGER
)
SERVER hdfs_server OPTIONS (dbname 'fdw_db', table_name 'emp');
```

Query with ORDER BY pushdown:

```sql
edb=# SET hdfs_fdw.enable_order_by_pushdown TO ON;
SET
edb=# EXPLAIN (COSTS OFF) SELECT * FROM emp order by deptno;
__OUTPUT__
QUERY PLAN
---------------------
Foreign Scan on emp
(1 row)

edb=# SET hdfs_fdw.enable_order_by_pushdown TO OFF;
SET
edb=# EXPLAIN (COSTS OFF) SELECT * FROM emp order by deptno;
__OUTPUT__
QUERY PLAN
---------------------------
Sort
Sort Key: deptno
-> Foreign Scan on emp
(3 rows)
```
Loading

1 comment on commit f89ac24

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.