Skip to content
This repository has been archived by the owner on Mar 27, 2024. It is now read-only.

v1.3.1

Compare
Choose a tag to compare
@HongW2019 HongW2019 released this 11 Apr 10:10
· 2 commits to branch-1.3.1 since this release
e472131

Overview

OAP 1.3.1 is a maintenance release and contains two major components: Gazelle and OAP MLlib. In this release, 51 issues/improvements were committed.
Here are the major features/improvements in OAP 1.3.1.

Gazelle (Native SQL Engine)

  • Reach 1.5X overall performance vs. vanilla Spark on TPC-DS 103 queries with 5TB dataset on ICX clusters
  • Reach 1.7X overall performance vs. vanilla Spark on TPC-H 22 queries with 3TB dataset on ICX clusters
  • Support Spark-3.1.1, Spark-3.1.2, Spark-3.1.3, Spark-3.2.0 and Spark-3.2.1
  • Support rand expression and complex types for ColumnarSortExec
  • Refactor on shuffled hash join/hash agg
  • Bug fix for SMJ and memory allocation in row to columnar, etc

OAP MLlib

  • Reach over 12X performance vs. vanilla Spark using PCA, Linear and Ridge Regression on ICX clusters
  • Support Spark-3.1.1, Spark-3.1.2, Spark-3.1.3, Spark-3.2.0, Spark-3.2.1 and CDH Spark
  • Bump Intel oneAPI Base Toolkit to 2022.1.2

Changelog

Gazelle Plugin

Features

#710 Add rand expression support
#745 improve codegen check
#761 Update the document to reflect the changes in build and deployment
#635 Document the incompatibility with Spark on Expressions
#702 Print output datatype for columnar shuffle on WebUI
#712 [Nested type] Optimize Array split and support nested Array
#732 [Nested type] Support Struct and Map nested types in Shuffle
#759 Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer

Performance

#610 refactor on shuffled hash join/hash agg

Bugs Fixed

#755 GetAttrFromExpr unsupported issue when run TPCDS Q57
#764 add java.version to clarify jdk version
#774 Fix runtime issues on spark 3.2
#778 Failed to find include file while running code gen
#725 gazelle failed to run with spark local
#746 Improve memory allocation on native row to column operator
#770 There are cast exception and null pointer expection in spark-3.2
#772 ColumnarBatchScan name missing in UI for Spark321
#740 Handle exceptions like std::out_of_range in casting string to numeric types in WSCG
#727 Create table failed with TPCH partiton dataset
#719 Wrong result on TPC-DS Q38, Q87
#705 Two unit tests failed on master branch

PRs

#834 [NSE-746]Fix memory allocation in row to columnar
#809 [NSE-746]Fix memory allocation in row to columnar
#817 [NSE-761] Update document to reflect spark 3.2.x support
#805 [NSE-772] Code refactor for ColumnarBatchScan
#802 [NSE-794] Fix count() with decimal value
#779 [NSE-778] Failed to find include file while running code gen
#798 [NSE-795] Fix a consecutive SMJ issue in wscg
#799 [NSE-791] fix xchg reuse in Spark321
#773 [NSE-770] [NSE-774] Fix runtime issues on spark 3.2
#787 [NSE-774] Fallback broadcast exchange for DPP to reuse
#763 [NSE-762] Add complex types support for ColumnarSortExec
#783 [NSE-782] prepare 1.3.1 release
#777 [NSE-732]Adding new config to enable/disable complex data type support
#776 [NSE-770] [NSE-774] Fix runtime issues on spark 3.2
#765 [NSE-764] declare java.version for maven
#767 [NSE-610] fix unit tests on SHJ
#760 [NSE-759] Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer
#757 [NSE-746]Fix memory allocation in row to columnar
#724 [NSE-725] change the code style for ExecutorManger
#751 [NSE-745] Improve codegen check for expression
#742 [NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2
#754 [NSE-755] Quick fix for ConverterUtils.getAttrFromExpr for TPCDS queries
#749 [NSE-732] Support Map complex type in Shuffle
#738 [NSE-610] hashjoin opt1
#733 [NSE-732] Support Struct complex type in Shuffle
#744 [NSE-740] fix codegen with out_of_range check
#743 [NSE-740] Catch out_of_range exception in casting string to numeric types in wscg
#735 [NSE-610] hashagg opt#2
#707 [NSE-710] Add rand expression support
#734 [NSE-727] Create table failed with TPCH partiton dataset, patch 2
#715 [NSE-610] hashagg opt#1
#731 [NSE-727] Create table failed with TPCH partiton dataset
#713 [NSE-712] Optimize Array split and support nested Array
#721 [NSE-719][backport]fix null check in SMJ
#720 [NSE-719] fix null check in SMJ
#718 Following NSE-702, fix for AQE enabled case
#691 [NSE-687]Try to upgrade log4j
#703 [NSE-702] Print output datatype for columnar shuffle on WebUI
#706 [NSE-705] Fallback R2C on unsupported cases
#657 [NSE-635] Add document to clarify incompatibility issues in expressions
#623 [NSE-602] Fix Array type shuffle split segmentation fault
#693 [NSE-692] JoinBenchmark is broken

OAP MLlib

Features

#189 Intel-MLlib not support spark-3.2.1 version
#186 [Core] Support CDH versions
#187 Intel-MLlib not support spark-3.1.3 version.
#180 [CI] Refactor CI and add code checks

Bugs Fixed

#202 [SDLe] Update oneAPI version to solve vulnerabilities
#171 [Core] detect if spark.dynamicAllocation.enabled is set true and exit gracefully
#185 [Naive Bayes]Big dataset will out of memory errors.
#184 [Core] Fix code style issues
#179 [GPU][PCA] use distributed covariance as the first step for PCA
#178 [ALS] Fix error when converting buffer to CSRNumericTable
#177 [Native Bayes] Fix error when converting Vector to CSRNumericTable

PRs

#203 [ML-202] Update oneAPI Base Toolkit version and prepare for OAP 1.3.1 release
#197 [ML-187]Support spark 3.1.3 and 3.2.0 and support CDH
#201 [ML-171]When enabled oap mllib, spark.dynamicAllocation.enabled should be set false.
#196 [ML-185]Select label and features columns and cache data
#195 [ML-184]Fix code style issues
#183 [ML-180][CI] Refactor CI and add code checks
#175 [ML-179][GPU] use distributed covariance as the first step for PCA
#182 [ML-178]fix als convert buffer to NumericTable
#176 [ML-177][Native Bayes] Fix error when converting Vector to CSRNumericTable