Merge pull request #3331 from EnterpriseDB/content/docs/mongo_data_ad…

…apter/new_release FDW - Release branch
EnterpriseDB · Jan 6, 2023 · f89ac24 · f89ac24 · github-actions · Jan 6, 2023
2 parents 669f9a5 + 4ecca37
commit f89ac24
Show file tree

Hide file tree

Showing 21 changed files with 631 additions and 22 deletions.
diff --git a/product_docs/docs/hadoop_data_adapter/2/02_requirements_overview.mdx b/product_docs/docs/hadoop_data_adapter/2/02_requirements_overview.mdx
@@ -6,6 +6,8 @@ This table lists the latest Hadoop Foreign Data Wrapper versions and their suppo
 
 | Hadoop Foreign Data Wrapper | EPAS 14 | EPAS 13 | EPAS 12 | EPAS 11 |  
 | --------------------------- | ------- | ------- | ------- | ------- | 
+| 2.3.0                       | Y       | Y       | Y       | Y       | 
+| 2.2.0                       | Y       | Y       | Y       | Y       | 
 | 2.1.0                       | Y       | Y       | Y       | Y       |  
 | 2.0.8                       | N       | Y       | Y       | Y       |  
 | 2.0.7                       | N       | Y       | Y       | N       |  

diff --git a/product_docs/docs/hadoop_data_adapter/2/07_features_of_hdfs_fdw.mdx b/product_docs/docs/hadoop_data_adapter/2/07_features_of_hdfs_fdw.mdx
@@ -14,16 +14,30 @@ Hadoop Foreign Data Wrapper supports column pushdown. As a result, the query bri
 
 ## Join pushdown
 
-Hadoop Foreign Data Wrapper supports join pushdown. It pushes the joins between the foreign tables of the same remote Hive or Spark server to that remote Hive or Spark server, enhancing the performance.
+Hadoop Foreign Data Wrapper supports join pushdown. It pushes the joins between the foreign tables of the same remote Hive or Spark server to the remote Hive or Spark server, enhancing the performance. 
 
-For an example, see [Example: Join pushdown](10a_example_join_pushdown).
+From version 2.3.0 and later, you can enable the join pushdown at session and query level using the `enable_join_pushdown` GUC variable.
+
+For more information, see [Example: Join pushdown](10a_example_join_pushdown).
 
 ## Aggregate pushdown
 
-Hadoop Foreign Data Wrapper supports aggregate pushdown. It pushes the aggregates to the remote Hive or Spark server instead of fetching all of the rows and aggregating them locally. This gives a very good performance boost for the cases where aggregates can be pushed down. The push-down is currently limited to aggregate functions min, max, sum, avg, and count, to avoid pushing down the functions that are not present on the Hadoop server. Also, aggregate filters and orders are not pushed down.
+Hadoop Foreign Data Wrapper supports aggregate pushdown. It pushes the aggregates to the remote Hive or Spark server instead of fetching all of the rows and aggregating them locally. This gives a very good performance boost for the cases where aggregates can be pushed down. The pushdown is currently limited to aggregate functions min, max, sum, avg, and count, to avoid pushing down the functions that are not present on the Hadoop server. Also, aggregate filters and orders are not pushed down.
 
 For more information, see [Example: Aggregate pushdown](10b_example_aggregate_pushdown).
 
+## ORDER BY pushdown
+
+Hadoop Foreign Data Wrapper supports order by pushdown. If possible, push the `ORDER BY` clause to the remote server. This approach provides the ordered result set from the foreign server, which can help to enable an efficient merge join. 
+
+For more information, see [Example: ORDER BY pushdown](10c_example_order_by_pushdown)
+
+## LIMIT OFFSET pushdown
+
+Hadoop Foreign Data Wrapper supports limit offset pushdown. Wherever possible, perform LIMIT and OFFSET operations on the remote server. This reduces network traffic between local Postgres and remote HDFS/Hive servers. ALL/NULL options aren't supported on the Hive server, which means they are not pushed down. Also, OFFSET without LIMIT isn't supported on the remote server. Queries having that construct are not pushed.
+
+For more information, see [Example: LIMIT OFFSET pushdown](10d_example_limit_offset_pushdown)
+
 ## Automated cleanup
 
 Hadoop Foreign Data Wrappper allows the cleanup of foreign tables in a single operation using the `DROP EXTENSION` command. This feature is specifically useful when a foreign table is set for a temporary purpose. The syntax is:

diff --git a/product_docs/docs/hadoop_data_adapter/2/10a_example_join_pushdown.mdx b/product_docs/docs/hadoop_data_adapter/2/10a_example_join_pushdown.mdx
@@ -6,7 +6,7 @@ This example shows join pushdown between the foreign tables of the same remote H
 
 Tables on HIVE/SPARK server:
 
-```text
+```sql
 0: jdbc:hive2://localhost:10000> describe emp;
 +-----------+------------+----------+--+
 | col_name  | data_type  | comment  |
@@ -30,23 +30,29 @@ Tables on HIVE/SPARK server:
 | loc       | string     | NULL     |
 +-----------+------------+----------+--+
 3 rows selected (0.067 seconds)
-
 ```
 
 Tables on Postgres server:
 
 ```sql
+-- load extension first time after install
 CREATE EXTENSION hdfs_fdw;
+
+-- create server object
 CREATE SERVER hdfs_server FOREIGN DATA WRAPPER hdfs_fdw OPTIONS(host 'localhost', port '10000', client_type 'spark', auth_type 'LDAP');
+
+-- create user mapping
 CREATE USER MAPPING FOR public SERVER hdfs_server OPTIONS (username 'user1', password 'pwd123');
 
+-- create foreign table
 CREATE FOREIGN TABLE dept (
     deptno          INTEGER,
     dname           VARCHAR(14),
     loc             VARCHAR(13)
 )
 SERVER hdfs_server OPTIONS (dbname 'fdw_db', table_name 'dept');
 
+-- create foreign table
 CREATE FOREIGN TABLE emp (
     empno           INTEGER,
     ename           VARCHAR(10),
@@ -62,49 +68,62 @@ SERVER hdfs_server OPTIONS (dbname 'fdw_db', table_name 'emp');
 
 Queries with join pushdown:
 
-```text
+```sql
 --inner join
 edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 INNER JOIN dept t2 ON ( t1.deptno = t2.deptno );
-                                                              QUERY PLAN                                                               
+__OUTPUT__
+                                                                QUERY PLAN                                                               
 ---------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan  (cost=15.00..35.00 rows=5000 width=84)
    Output: t1.ename, t2.dname
    Relations: (fdw_db.emp t1) INNER JOIN (fdw_db.dept t2)
    Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 INNER JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
-(4 rows)
+ (4 rows)
+```
 
+```sql
 --left join
 edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 LEFT JOIN dept t2 ON ( t1.deptno = t2.deptno );
+__OUTPUT__
                                                               QUERY PLAN                                                              
 --------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan  (cost=15.00..35.00 rows=5000 width=84)
    Output: t1.ename, t2.dname
    Relations: (fdw_db.emp t1) LEFT JOIN (fdw_db.dept t2)
    Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 LEFT JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
 (4 rows)
+```
 
+```sql
 --right join
 edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 RIGHT JOIN dept t2 ON ( t1.deptno = t2.deptno );
+__OUTPUT__
                                                               QUERY PLAN                                                              
 --------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan  (cost=15.00..35.00 rows=5000 width=84)
    Output: t1.ename, t2.dname
    Relations: (fdw_db.dept t2) LEFT JOIN (fdw_db.emp t1)
    Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`dept` r2 LEFT JOIN `fdw_db`.`emp` r1 ON (((r1.`deptno` = r2.`deptno`))))
 (4 rows)
+```
 
+```sql
 --full join
 edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 FULL JOIN dept t2 ON ( t1.deptno = t2.deptno );
+__OUTPUT__
                                                               QUERY PLAN                                                              
 --------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan  (cost=15.00..35.00 rows=5000 width=84)
    Output: t1.ename, t2.dname
    Relations: (fdw_db.emp t1) FULL JOIN (fdw_db.dept t2)
    Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 FULL JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
 (4 rows)
+```
 
+```sql
 --cross join
 edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 CROSS JOIN dept t2;
+__OUTPUT__
                                                   QUERY PLAN                                                  
 --------------------------------------------------------------------------------------------------------------
  Foreign Scan  (cost=15.00..35.00 rows=1000000 width=84)
@@ -113,3 +132,80 @@ edb=# EXPLAIN VERBOSE SELECT t1.ename, t2.dname FROM emp t1 CROSS JOIN dept t2;
    Remote SQL: SELECT r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 INNER JOIN `fdw_db`.`dept` r2 ON (TRUE))
 (4 rows)
 ```
+
+Enable/disable GUC for join pushdown queries at table level:
+
+```sql
+-- enable join pushdown at the table level
+ALTER FOREIGN TABLE emp OPTIONS (SET enable_join_pushdown 'true');
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT e.empno, e.ename, d.dname
+  FROM emp e JOIN dept d ON (e.deptno = d.deptno)
+  ORDER BY e.empno;
+__OUTPUT__
+                                                                       QUERY PLAN                                                                        
+---------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+   Output: e.empno, e.ename, d.dname
+   Sort Key: e.empno
+   ->  Foreign Scan
+         Output: e.empno, e.ename, d.dname
+         Relations: (fdw_db.emp e) INNER JOIN (fdw_db.dept d)
+         Remote SQL: SELECT r1.`empno`, r1.`ename`, r2.`dname` FROM (`fdw_db`.`emp` r1 INNER JOIN `fdw_db`.`dept` r2 ON (((r1.`deptno` = r2.`deptno`))))
+(7 rows)
+```
+
+```sql
+--Disable the GUC enable_join_pushdown.
+SET hdfs_fdw.enable_join_pushdown to false;
+-- Pushdown shouldn't happen as enable_join_pushdown is false.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT e.empno, e.ename, d.dname
+  FROM emp e JOIN dept d ON (e.deptno = d.deptno)
+  ORDER BY e.empno;
+__OUTPUT__
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Sort
+   Output: e.empno, e.ename, d.dname
+   Sort Key: e.empno
+   ->  Nested Loop
+         Output: e.empno, e.ename, d.dname
+         Join Filter: (e.deptno = d.deptno)
+         ->  Foreign Scan on public.emp e
+               Output: e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno
+               Remote SQL: SELECT `empno`, `ename`, `deptno` FROM `fdw_db`.`emp`
+         ->  Materialize
+               Output: d.dname, d.deptno
+               ->  Foreign Scan on public.dept d
+                     Output: d.dname, d.deptno
+                     Remote SQL: SELECT `deptno`, `dname` FROM `fdw_db`.`dept`
+```
+
+Enable/disable GUC for join pushdown queries at the session level:
+
+```sql
+SET hdfs_fdw.enable_join_pushdown to true;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT e.empno, e.ename, d.dname
+  FROM emp e JOIN dept d ON (e.deptno = d.deptno)
+  ORDER BY e.empno;
+__OUTPUT__
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Sort
+   Output: e.empno, e.ename, d.dname
+   Sort Key: e.empno
+   ->  Nested Loop
+         Output: e.empno, e.ename, d.dname
+         Join Filter: (e.deptno = d.deptno)
+         ->  Foreign Scan on public.emp e
+               Output: e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno
+               Remote SQL: SELECT `empno`, `ename`, `deptno` FROM `fdw_db`.`emp`
+         ->  Materialize
+               Output: d.dname, d.deptno
+               ->  Foreign Scan on public.dept d
+                     Output: d.dname, d.deptno
+                     Remote SQL: SELECT `deptno`, `dname` FROM `fdw_db`.`dept`
+(14 rows)
+```
diff --git a/product_docs/docs/hadoop_data_adapter/2/10b_example_aggregate_pushdown.mdx b/product_docs/docs/hadoop_data_adapter/2/10b_example_aggregate_pushdown.mdx
@@ -6,7 +6,7 @@ This example shows aggregate pushdown between the foreign tables of the same rem
 
 Tables on HIVE/SPARK server:
 
-```text
+```sql
 0: jdbc:hive2://localhost:10000> describe emp;
 +-----------+------------+----------+--+
 | col_name  | data_type  | comment  |
@@ -30,7 +30,6 @@ Tables on HIVE/SPARK server:
 | loc       | string     | NULL     |
 +-----------+------------+----------+--+
 3 rows selected (0.067 seconds)
-
 ```
 
 Tables on Postgres server:
@@ -75,7 +74,7 @@ SELECT deptno, COUNT(*),SUM(sal),MAX(sal),MIN(sal),AVG(sal) FROM emp
 GROUP BY deptno
 HAVING deptno IN (10,20)
 ORDER BY deptno;
-
+__OUTPUT__
 QUERY PLAN                                                                         
 ------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort

diff --git a/product_docs/docs/hadoop_data_adapter/2/10c_example_order_by_pushdown.mdx b/product_docs/docs/hadoop_data_adapter/2/10c_example_order_by_pushdown.mdx
@@ -0,0 +1,83 @@
+---
+title: "Example: ORDER BY pushdown "
+---
+
+This example shows ORDER BY pushdown between the foreign tables of the same remote HIVE/SPARK server as the remote HIVE/SPARK server:
+
+Tables on HIVE/SPARK server:
+
+```sql
+0: jdbc:hive2://localhost:10000> describe emp;
++-----------+------------+----------+--+
+| col_name  | data_type  | comment  |
++-----------+------------+----------+--+
+| empno     | int        | NULL     |
+| ename     | string     | NULL     |
+| job       | string     | NULL     |
+| mgr       | int        | NULL     |
+| hiredate  | date       | NULL     |
+| sal       | int        | NULL     |
+| comm      | int        | NULL     |
+| deptno    | int        | NULL     |
++-----------+------------+----------+--+
+8 rows selected (0.747 seconds)
+0: jdbc:hive2://localhost:10000> describe dept;
++-----------+------------+----------+--+
+| col_name  | data_type  | comment  |
++-----------+------------+----------+--+
+| deptno    | int        | NULL     |
+| dname     | string     | NULL     |
+| loc       | string     | NULL     |
++-----------+------------+----------+--+
+3 rows selected (0.067 seconds)
+```
+
+Tables on Postgres server:
+
+```sql
+-- load extension first time after install
+CREATE EXTENSION hdfs_fdw;
+
+-- create server object
+CREATE SERVER hdfs_server FOREIGN DATA WRAPPER hdfs_fdw OPTIONS(host 'localhost', port '10000', client_type 'spark', auth_type 'LDAP');
+
+-- create user mapping
+CREATE USER MAPPING FOR public SERVER hdfs_server OPTIONS (username 'user1', password 'pwd123');
+
+-- create foreign table
+CREATE FOREIGN TABLE emp (
+    empno           INTEGER,
+    ename           VARCHAR(10),
+    job             VARCHAR(9),
+    mgr             INTEGER,
+    hiredate        DATE,
+    sal             INTEGER,
+    comm            INTEGER,
+    deptno          INTEGER
+)
+SERVER hdfs_server OPTIONS (dbname 'fdw_db', table_name 'emp');
+```
+
+Query with ORDER BY pushdown:
+
+```sql
+edb=# SET hdfs_fdw.enable_order_by_pushdown TO ON;
+SET
+edb=# EXPLAIN (COSTS OFF) SELECT * FROM emp order by deptno;
+__OUTPUT__
+     QUERY PLAN      
+---------------------
+ Foreign Scan on emp
+(1 row)
+
+edb=# SET hdfs_fdw.enable_order_by_pushdown TO OFF;
+SET
+edb=# EXPLAIN (COSTS OFF) SELECT * FROM emp order by deptno;
+__OUTPUT__
+        QUERY PLAN         
+---------------------------
+ Sort
+   Sort Key: deptno
+   ->  Foreign Scan on emp
+(3 rows)
+```