Overview

OAP 1.3.1 is a maintenance release and contains two major components: Gazelle and OAP MLlib. In this release, 51 issues/improvements were committed.
Here are the major features/improvements in OAP 1.3.1.

Gazelle (Native SQL Engine)

Reach 1.5X overall performance vs. vanilla Spark on TPC-DS 103 queries with 5TB dataset on ICX clusters
Reach 1.7X overall performance vs. vanilla Spark on TPC-H 22 queries with 3TB dataset on ICX clusters
Support Spark-3.1.1, Spark-3.1.2, Spark-3.1.3, Spark-3.2.0 and Spark-3.2.1
Support rand expression and complex types for ColumnarSortExec
Refactor on shuffled hash join/hash agg
Bug fix for SMJ and memory allocation in row to columnar, etc

OAP MLlib

Reach over 12X performance vs. vanilla Spark using PCA, Linear and Ridge Regression on ICX clusters
Support Spark-3.1.1, Spark-3.1.2, Spark-3.1.3, Spark-3.2.0, Spark-3.2.1 and CDH Spark
Bump Intel oneAPI Base Toolkit to 2022.1.2

Changelog

Gazelle Plugin

Features


#710	Add rand expression support
#745	improve codegen check
#761	Update the document to reflect the changes in build and deployment
#635	Document the incompatibility with Spark on Expressions
#702	Print output datatype for columnar shuffle on WebUI
#712	[Nested type] Optimize Array split and support nested Array
#732	[Nested type] Support Struct and Map nested types in Shuffle
#759	Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer

Performance


#610	refactor on shuffled hash join/hash agg

Bugs Fixed


#755	GetAttrFromExpr unsupported issue when run TPCDS Q57
#764	add java.version to clarify jdk version
#774	Fix runtime issues on spark 3.2
#778	Failed to find include file while running code gen
#725	gazelle failed to run with spark local
#746	Improve memory allocation on native row to column operator
#770	There are cast exception and null pointer expection in spark-3.2
#772	ColumnarBatchScan name missing in UI for Spark321
#740	Handle exceptions like std::out_of_range in casting string to numeric types in WSCG
#727	Create table failed with TPCH partiton dataset
#719	Wrong result on TPC-DS Q38, Q87
#705	Two unit tests failed on master branch

PRs


#834	[NSE-746]Fix memory allocation in row to columnar
#809	[NSE-746]Fix memory allocation in row to columnar
#817	[NSE-761] Update document to reflect spark 3.2.x support
#805	[NSE-772] Code refactor for ColumnarBatchScan
#802	[NSE-794] Fix count() with decimal value
#779	[NSE-778] Failed to find include file while running code gen
#798	[NSE-795] Fix a consecutive SMJ issue in wscg
#799	[NSE-791] fix xchg reuse in Spark321
#773	[NSE-770] [NSE-774] Fix runtime issues on spark 3.2
#787	[NSE-774] Fallback broadcast exchange for DPP to reuse
#763	[NSE-762] Add complex types support for ColumnarSortExec
#783	[NSE-782] prepare 1.3.1 release
#777	[NSE-732]Adding new config to enable/disable complex data type support
#776	[NSE-770] [NSE-774] Fix runtime issues on spark 3.2
#765	[NSE-764] declare java.version for maven
#767	[NSE-610] fix unit tests on SHJ
#760	[NSE-759] Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer
#757	[NSE-746]Fix memory allocation in row to columnar
#724	[NSE-725] change the code style for ExecutorManger
#751	[NSE-745] Improve codegen check for expression
#742	[NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2
#754	[NSE-755] Quick fix for ConverterUtils.getAttrFromExpr for TPCDS queries
#749	[NSE-732] Support Map complex type in Shuffle
#738	[NSE-610] hashjoin opt1
#733	[NSE-732] Support Struct complex type in Shuffle
#744	[NSE-740] fix codegen with out_of_range check
#743	[NSE-740] Catch out_of_range exception in casting string to numeric types in wscg
#735	[NSE-610] hashagg opt#2
#707	[NSE-710] Add rand expression support
#734	[NSE-727] Create table failed with TPCH partiton dataset, patch 2
#715	[NSE-610] hashagg opt#1
#731	[NSE-727] Create table failed with TPCH partiton dataset
#713	[NSE-712] Optimize Array split and support nested Array
#721	[NSE-719][backport]fix null check in SMJ
#720	[NSE-719] fix null check in SMJ
#718	Following NSE-702, fix for AQE enabled case
#691	[NSE-687]Try to upgrade log4j
#703	[NSE-702] Print output datatype for columnar shuffle on WebUI
#706	[NSE-705] Fallback R2C on unsupported cases
#657	[NSE-635] Add document to clarify incompatibility issues in expressions
#623	[NSE-602] Fix Array type shuffle split segmentation fault
#693	[NSE-692] JoinBenchmark is broken

OAP MLlib

Features


#189	Intel-MLlib not support spark-3.2.1 version
#186	[Core] Support CDH versions
#187	Intel-MLlib not support spark-3.1.3 version.
#180	[CI] Refactor CI and add code checks

Bugs Fixed


#202	[SDLe] Update oneAPI version to solve vulnerabilities
#171	[Core] detect if spark.dynamicAllocation.enabled is set true and exit gracefully
#185	[Naive Bayes]Big dataset will out of memory errors.
#184	[Core] Fix code style issues
#179	[GPU][PCA] use distributed covariance as the first step for PCA
#178	[ALS] Fix error when converting buffer to CSRNumericTable
#177	[Native Bayes] Fix error when converting Vector to CSRNumericTable

PRs


#203	[ML-202] Update oneAPI Base Toolkit version and prepare for OAP 1.3.1 release
#197	[ML-187]Support spark 3.1.3 and 3.2.0 and support CDH
#201	[ML-171]When enabled oap mllib, spark.dynamicAllocation.enabled should be set false.
#196	[ML-185]Select label and features columns and cache data
#195	[ML-184]Fix code style issues
#183	[ML-180][CI] Refactor CI and add code checks
#175	[ML-179][GPU] use distributed covariance as the first step for PCA
#182	[ML-178]fix als convert buffer to NumericTable
#176	[ML-177][Native Bayes] Fix error when converting Vector to CSRNumericTable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.1

Overview

Gazelle (Native SQL Engine)

OAP MLlib

Changelog

Gazelle Plugin

Features

Performance

Bugs Fixed

PRs

OAP MLlib

Features

Bugs Fixed

PRs