This repository has been archived by the owner on Mar 27, 2024. It is now read-only.
v1.3.1
Overview
OAP 1.3.1 is a maintenance release and contains two major components: Gazelle and OAP MLlib. In this release, 51 issues/improvements were committed.
Here are the major features/improvements in OAP 1.3.1.
Gazelle (Native SQL Engine)
- Reach 1.5X overall performance vs. vanilla Spark on TPC-DS 103 queries with 5TB dataset on ICX clusters
- Reach 1.7X overall performance vs. vanilla Spark on TPC-H 22 queries with 3TB dataset on ICX clusters
- Support Spark-3.1.1, Spark-3.1.2, Spark-3.1.3, Spark-3.2.0 and Spark-3.2.1
- Support rand expression and complex types for ColumnarSortExec
- Refactor on shuffled hash join/hash agg
- Bug fix for SMJ and memory allocation in row to columnar, etc
OAP MLlib
- Reach over 12X performance vs. vanilla Spark using PCA, Linear and Ridge Regression on ICX clusters
- Support Spark-3.1.1, Spark-3.1.2, Spark-3.1.3, Spark-3.2.0, Spark-3.2.1 and CDH Spark
- Bump Intel oneAPI Base Toolkit to 2022.1.2
Changelog
Gazelle Plugin
Features
#710 | Add rand expression support |
#745 | improve codegen check |
#761 | Update the document to reflect the changes in build and deployment |
#635 | Document the incompatibility with Spark on Expressions |
#702 | Print output datatype for columnar shuffle on WebUI |
#712 | [Nested type] Optimize Array split and support nested Array |
#732 | [Nested type] Support Struct and Map nested types in Shuffle |
#759 | Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer |
Performance
#610 | refactor on shuffled hash join/hash agg |
Bugs Fixed
#755 | GetAttrFromExpr unsupported issue when run TPCDS Q57 |
#764 | add java.version to clarify jdk version |
#774 | Fix runtime issues on spark 3.2 |
#778 | Failed to find include file while running code gen |
#725 | gazelle failed to run with spark local |
#746 | Improve memory allocation on native row to column operator |
#770 | There are cast exception and null pointer expection in spark-3.2 |
#772 | ColumnarBatchScan name missing in UI for Spark321 |
#740 | Handle exceptions like std::out_of_range in casting string to numeric types in WSCG |
#727 | Create table failed with TPCH partiton dataset |
#719 | Wrong result on TPC-DS Q38, Q87 |
#705 | Two unit tests failed on master branch |
PRs
#834 | [NSE-746]Fix memory allocation in row to columnar |
#809 | [NSE-746]Fix memory allocation in row to columnar |
#817 | [NSE-761] Update document to reflect spark 3.2.x support |
#805 | [NSE-772] Code refactor for ColumnarBatchScan |
#802 | [NSE-794] Fix count() with decimal value |
#779 | [NSE-778] Failed to find include file while running code gen |
#798 | [NSE-795] Fix a consecutive SMJ issue in wscg |
#799 | [NSE-791] fix xchg reuse in Spark321 |
#773 | [NSE-770] [NSE-774] Fix runtime issues on spark 3.2 |
#787 | [NSE-774] Fallback broadcast exchange for DPP to reuse |
#763 | [NSE-762] Add complex types support for ColumnarSortExec |
#783 | [NSE-782] prepare 1.3.1 release |
#777 | [NSE-732]Adding new config to enable/disable complex data type support |
#776 | [NSE-770] [NSE-774] Fix runtime issues on spark 3.2 |
#765 | [NSE-764] declare java.version for maven |
#767 | [NSE-610] fix unit tests on SHJ |
#760 | [NSE-759] Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer |
#757 | [NSE-746]Fix memory allocation in row to columnar |
#724 | [NSE-725] change the code style for ExecutorManger |
#751 | [NSE-745] Improve codegen check for expression |
#742 | [NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2 |
#754 | [NSE-755] Quick fix for ConverterUtils.getAttrFromExpr for TPCDS queries |
#749 | [NSE-732] Support Map complex type in Shuffle |
#738 | [NSE-610] hashjoin opt1 |
#733 | [NSE-732] Support Struct complex type in Shuffle |
#744 | [NSE-740] fix codegen with out_of_range check |
#743 | [NSE-740] Catch out_of_range exception in casting string to numeric types in wscg |
#735 | [NSE-610] hashagg opt#2 |
#707 | [NSE-710] Add rand expression support |
#734 | [NSE-727] Create table failed with TPCH partiton dataset, patch 2 |
#715 | [NSE-610] hashagg opt#1 |
#731 | [NSE-727] Create table failed with TPCH partiton dataset |
#713 | [NSE-712] Optimize Array split and support nested Array |
#721 | [NSE-719][backport]fix null check in SMJ |
#720 | [NSE-719] fix null check in SMJ |
#718 | Following NSE-702, fix for AQE enabled case |
#691 | [NSE-687]Try to upgrade log4j |
#703 | [NSE-702] Print output datatype for columnar shuffle on WebUI |
#706 | [NSE-705] Fallback R2C on unsupported cases |
#657 | [NSE-635] Add document to clarify incompatibility issues in expressions |
#623 | [NSE-602] Fix Array type shuffle split segmentation fault |
#693 | [NSE-692] JoinBenchmark is broken |
OAP MLlib
Features
#189 | Intel-MLlib not support spark-3.2.1 version |
#186 | [Core] Support CDH versions |
#187 | Intel-MLlib not support spark-3.1.3 version. |
#180 | [CI] Refactor CI and add code checks |
Bugs Fixed
#202 | [SDLe] Update oneAPI version to solve vulnerabilities |
#171 | [Core] detect if spark.dynamicAllocation.enabled is set true and exit gracefully |
#185 | [Naive Bayes]Big dataset will out of memory errors. |
#184 | [Core] Fix code style issues |
#179 | [GPU][PCA] use distributed covariance as the first step for PCA |
#178 | [ALS] Fix error when converting buffer to CSRNumericTable |
#177 | [Native Bayes] Fix error when converting Vector to CSRNumericTable |
PRs
#203 | [ML-202] Update oneAPI Base Toolkit version and prepare for OAP 1.3.1 release |
#197 | [ML-187]Support spark 3.1.3 and 3.2.0 and support CDH |
#201 | [ML-171]When enabled oap mllib, spark.dynamicAllocation.enabled should be set false. |
#196 | [ML-185]Select label and features columns and cache data |
#195 | [ML-184]Fix code style issues |
#183 | [ML-180][CI] Refactor CI and add code checks |
#175 | [ML-179][GPU] use distributed covariance as the first step for PCA |
#182 | [ML-178]fix als convert buffer to NumericTable |
#176 | [ML-177][Native Bayes] Fix error when converting Vector to CSRNumericTable |