Skip to content

Latest commit

 

History

History
1443 lines (1308 loc) · 126 KB

CHANGELOG.md

File metadata and controls

1443 lines (1308 loc) · 126 KB

Change log

Generated on 2022-12-21

Release 1.5.0

Gazelle Plugin

Features

#931 Reuse partition vectors for arrow scan
#955 implement missing expressions
#1120 Support aggregation window functions with order by
#1135 Supports Spark 3.2.2 shims
#1114 Remove tmp directory after application exits
#862 implement row_number window function
#1007 Document how to test columnar UDF
#942 Use hash aggregate for string type input

Performance

#1144 Optimize cast WSCG performance

Bugs Fixed

#1170 Segfault on data source v2
#1164 Limit the column num in WSCG
#1166 Peers' values should be considered in window function for CURRENT ROW in range mode
#1149 Vulnerability issues
#1112 Validate Error: “Invalid: Length spanned by binary offsets (21) larger than values array (size 20)”
#1103 wrong hashagg results
#929 Failed to add user extension while using gazelle
#1100 Wildcard in json path is not supported
#1079 Like function gets wrong result when default escape char is contained
#1046 Fall back to use row-based operators, error is makeStructField is unable to parse from conv
#1053 Exception when there is function expression in pos or len of substring
#1024 ShortType is not supported in ColumnarLiteral
#1034 Exception when there is unix_timestamp in CaseWhen
#1032 Missing WSCG check for ExistenceJoin
#1027 partition by literal in window function
#1019 Support more date formats for from_unixtime & unix_timestamp
#999 The performance of using ColumnarSort operator to sort string type is significantly lower than that of native spark Sortexec
#984 concat_ws
#958 JVM/Native R2C and CoalesceBatcth process time inaccuracy
#979 Failed to find column while reading parquet with case insensitive

PRs

#1192 [NSE-1191] fix AQE exchange reuse in Spark3.2
#1180 [NSE-1193] fix jni unload
#1175 [NSE-1171] Support merge parquet schema and read missing schema
#1178 [NSE-1161][FOLLOWUP] Remove extra compression type check
#1162 [NSE-1161] Support read-write parquet conversion to read-write arrow
#1014 [NSE-956] allow to write parquet with compression
#1176 bump h2/pgsql version
#1173 [NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode
#1172 [NSE-1170] Setting correct row number in batch scan w/ partition columns
#1169 [NSE-1161] Format sql config string key
#1167 [NSE-1166] Cover peers' values in sum window function in range mode
#1165 [NSE-1164] Limit the max column num in WSCG
#1160 [NSE-1149] upgrade guava to 30.1.1
#1158 [NSE-1149] upgrade guava to 30.1.1
#1152 [NSE-1149] upgrade guava to 24.1.1
#1153 [NSE-1149] upgrade pgsql to 42.3.3
#1150 [NSE-1149] Remove log4j in shims module
#1146 [NSE-1135] Introduce shim layer for supporting spark 3.2.2
#1145 [NSE-1144] Optimize cast wscg performance
#1136 Remove project from wscg when it's the child of window
#1122 [NSE-1120] Support sum window function with order by statement
#1131 [NSE-1114] Remove temp directory without FileUtils.forceDeleteOnExit
#1129 [NSE-1127] Use larger buffer for hash agg
#1130 [NSE-610] fix hashjoin build time metric
#1126 [NSE-1125] Add status check for hashing GetOrInsert
#1056 [NSE-955] Support window function lag
#1123 [NSE-1118] fix codegen on TPCDS Q88
#1119 [NSE-1118] adding more checks for SMJ codegen
#1058 [NSE-981] Add a test suite for projection codegen
#1117 [NSE-1116] Disable columnar url_decoder
#1113 [NSE-1112] Fix Arrow array meta data validating issue when writing parquet files
#1039 [NSE-1019] fix codegen for all expressions
#1115 [NSE-1114] Remove tmp directory after application exits
#1111 remove debug log
#1098 [NSE-1108] allow to use different cases in column names
#1082 [NSE-1071] Refactor vector resizing in hash aggregate
#1036 [NSE-987] fix string date
#948 [NSE-947] Add a whole stage fallback strategy
#1099 [NSE-1104] fix hashagg w/ empty string
#1102 [NSE-400] Fix memory leak for native C2R and R2C.
#1101 [NSE-1100] Fall back get_json_object when wildcard is contained in json path
#1090 [NSE-1065] fix on count distinct w/ keys
#1097 Ignore two unit tests
#1081 [NSE-1075] Support dynamic merge file partition
#1080 [NSE-1079] Set the default escape char for like function
#1078 [NSE-610] support big keys in hashagg
#1072 [NSE-1071] Add tiny optimizations for hash aggregation functions
#1069 [NSE-800] Remove spark-arrow-datasource-parquet in assembly
#1066 [NSE-1065] Adding hashagg w/ filter support
#1067 [NSE-958] Fix JVM R2C operator metrics
#935 [NSE-931] Reuse partition vectors for arrow scan
#1064 [NSE-955] Implement parse_url
#1063 [NSE-955] Support more date format in unix timestamp
#930 [NSE-929] Support user defined spark extensions
#1038 [NSE-928] allow to sort with big partitions
#1057 [NSE-1019] fix codegen for unixtimestamp
#1055 [NSE-955] Support md5/sha1/sha2 functions
#903 [NSE-610] hashagg opt#3
#1044 [NE-400] fix memory leakage in native columnartorow
#1041 [NSE-1023] [NSE-1046] Cover more supported expressions in getting AttributeReference
#1054 [NSE-1053] Support function in substring's pos and len
#1049 [NSE-955] Support bin function
#1048 [NSE-955] Support power function
#1042 [NSE-955] Support find_in_set function
#1025 [NSE-1024] Support ShortType in ColumnarLiteral
#1037 [NSE-955] Turn on the support for get_json_object
#1033 [NSE-1032] Adding WSCG check for keys in Join
#1035 [NSE-1034] Add timeZoneId in ColumnarUnixTimestamp
#1028 [NSE-1027] Problem with Literal in window function
#1017 [NSE-999] use TimSort for STRING/DECIMAL onekey based sorting
#1022 [NSE-955] Support remainder function
#1021 [NSE-1019] [NSE-1020] Support more date formats and be aware of local time zone in handling unix timestamp
#1009 [NSE-999] s/string/string_view in sort
#990 [NSE-943] Improve rowtocolumn operator
#1000 [NSE-862] improve row_number()
#1013 [NSE-955] Add Murmur3Hash expression support
#995 [NSE-981] Add more codegen checking in BHJ & SHJ
#1006 [NSE-1007] Add a test guide for columnar UDF
#969 [NSE-943] Optimize data conversion for String/Binary type in Row2Columnar
#973 [NSE-928] Add ARROW_CHECK for batch_size check
#992 [NSE-984] fix concat_ws
#991 [NSE-981] check all expressions in HashAgg
#993 [NSE-979] fix data source
#980 [NSE-979] Support reading parquet with case sensitive
#985 [NSE-981] Implement supportColumnarCodegen to reflect the actual support state
#964 [NSE-955] implement lpad/rpad
#963 [NSE-955] implement concat_ws
#971 [NSE-955] Support hex expression
#968 [NSE-955] implement lower function
#965 [NSE-955] Support expression conv
#949 [NSE-862] implement row_number function
#960 [NSE-955] doc: Add columnar expression development guide
#941 [NSE-942] Force to use hash aggregate for string type input
#959 [NSE-958] Fix SQLMetrics inaccuracy in JVM/Native R2C and CoalesceBatcth

Release 1.4.0

Gazelle Plugin

Features

#781 Add spark eventlog analyzer for advanced analyzing
#927 Column2Row further enhancement
#913 Add Hadoop 3.3 profile to pom.xml
#869 implement first agg function
#926 Support UDF URLDecoder
#856 [SHUFFLE] manually split of Variable length buffer (String likely)
#886 Add pmod function support
#855 [SHUFFLE] HugePage support in shuffle
#872 implement replace function
#867 Add substring_index function support
#818 Support length, char_length, locate, regexp_extract
#864 Enable native parquet write by default
#828 CoalesceBatches native implementation
#800 Combine datasource and columnar core jar

Performance

#848 Optimize Columnar2Row performance
#943 Optimize Row2Columnar performance
#854 Enable skipping columnarWSCG for queries with small shuffle size
#857 [SHUFFLE] split by reducer by column

Bugs Fixed

#827 Github action is broken
#987 TPC-H q7, q8, q9 run failed when using String for Date
#892 Q47 and q57 failed on ubuntu 20.04 OS without open-jdk.
#784 Improve Sort Spill
#788 Spark UT of "randomSplit on reordered partitions" encountered "Invalid: Map array child array should have no nulls" issue
#821 Improve Wholestage Codegen check
#831 Support more expression types in getting attribute
#876 Write arrow hang with OutputWriter.path
#891 Spark executor lost while DatasetFileWriter failed with speculation
#909 "INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension
#889 Failed to write with ParquetFileFormat while using ArrowWriteExtension
#910 TPCDS failed, segfault caused by PR903
#852 Unit test fix for NSE-843
#843 ArrowDataSouce: Arrow dataset inspect() is called every time a file is read

PRs

#1005 [NSE-800] Fix an assembly warning
#1002 [NSE-800] Pack the classes into one single jar
#988 [NSE-987] fix string date
#977 [NSE-126] set default codegen opt to O1
#975 [NSE-927] Add macro AVX512BW check for different CPU architecture
#962 [NSE-359] disable unit tests on spark32 package
#966 [NSE-913] Add support for Hadoop 3.3.1 when packaging
#936 [NSE-943] Optimize IsNULL() function for Row2Columnar
#937 [NSE-927] Implement AVX512 optimization selection in Runtime and merge two C2R code files into one.
#951 [DNM] update sparklog
#938 [NSE-581] implement rlike/regexp_like
#946 [DNM] update on sparklog script
#939 [NSE-581] adding ShortType/FloatType in ColumnarLiteral
#934 [NSE-927] Extract and inline functions for native ColumnartoRow
#933 [NSE-581] Improve GetArrayItem(Split()) performance
#922 [NSE-912] Remove extra handleSafe costs
#925 [NSE-926] Support a UDF: URLDecoder
#924 [NSE-927] Enable AVX512 in Binary length calculation for native ColumnartoRow
#918 [NSE-856] Optimize of string/binary split
#908 [NSE-848] Optimize performance for Column2Row
#900 [NSE-869] Add 'first' agg function support
#917 [NSE-886] Add pmod expression support
#916 [NSE-909] fix slow test
#915 [NSE-857] Further optimizations of validity buffer split
#912 [NSE-909] "INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y L…
#896 [NSE-889] Failed to write with ParquetFileFormat while using ArrowWriteExtension
#911 [NSE-910] fix bug of PR903
#901 [NSE-891] Spark executor lost while DatasetFileWriter failed with speculation
#907 [NSE-857] split validity buffer by reducer
#902 [NSE-892] Allow to use jar cmd not in PATH
#898 [NSE-867][FOLLOWUP] Add substring_index function support
#894 [NSE-855] allocate large block of memory for all reducer #881
#880 [NSE-857] Fill destination buffer by reducer
#839 [DNM] some optimizations to shuffle's split function
#879 [NSE-878]Wip get phyplan bugfix
#877 [NSE-876] Fix writing arrow hang with OutputWriter.path
#873 [NSE-872] implement replace function
#850 [NSE-854] Small Shuffle Size disable wholestagecodegen
#868 [NSE-867] Add substring_index function support
#847 [NSE-818] Support length, char_length, locate & regexp_extract
#865 [NSE-864] Enable native parquet write by default
#811 [NSE-810] disable codegen for SMJ with local limit
#860 remove sensitive info from physical plan
#853 [NSE-852] Unit test fix for NSE-843
#844 [NSE-843] ArrowDataSouce: Arrow dataset inspect() is called every tim…
#842 fix in eventlog script
#841 fix bug of script
#829 [NSE-828] Add native CoalesceBatches implementation
#830 [NSE-831] Support more expression types in getting attribute
#815 [NSE-610] Shrink hashmap to use less memory
#822 [NSE-821] Fix Wholestage Codegen on unsupported pattern
#824 [NSE-823] Use SPARK_VERSION_SHORT instead of SPARK_VERSION to find SparkShims
#826 [NSE-827] fix GHA
#819 [DNM] complete sparklog script
#802 [NSE-794] Fix count() with decimal value
#801 [NSE-786] Adding docs for shim layers
#790 [NSE-781]Add eventlog analyzer tool
#789 [NSE-788] Quick fix for randomSplit on reordered partitions
#780 [NSE-784] fallback Sort after SortHashAgg

OAP MLlib

Performance

#204 Intel-MLlib require more memory to run Bayes algorithm.

PRs

#208 [ML-204][NaiveBayes] Remove cache from NaiveBayes

Release 1.3.1

Gazelle Plugin

Features

#710 Add rand expression support
#745 improve codegen check
#761 Update the document to reflect the changes in build and deployment
#635 Document the incompatibility with Spark on Expressions
#702 Print output datatype for columnar shuffle on WebUI
#712 [Nested type] Optimize Array split and support nested Array
#732 [Nested type] Support Struct and Map nested types in Shuffle
#759 Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer

Performance

#610 refactor on shuffled hash join/hash agg

Bugs Fixed

#755 GetAttrFromExpr unsupported issue when run TPCDS Q57
#764 add java.version to clarify jdk version
#774 Fix runtime issues on spark 3.2
#778 Failed to find include file while running code gen
#725 gazelle failed to run with spark local
#746 Improve memory allocation on native row to column operator
#770 There are cast exception and null pointer expection in spark-3.2
#772 ColumnarBatchScan name missing in UI for Spark321
#740 Handle exceptions like std::out_of_range in casting string to numeric types in WSCG
#727 Create table failed with TPCH partiton dataset
#719 Wrong result on TPC-DS Q38, Q87
#705 Two unit tests failed on master branch

PRs

#834 [NSE-746]Fix memory allocation in row to columnar
#809 [NSE-746]Fix memory allocation in row to columnar
#817 [NSE-761] Update document to reflect spark 3.2.x support
#805 [NSE-772] Code refactor for ColumnarBatchScan
#802 [NSE-794] Fix count() with decimal value
#779 [NSE-778] Failed to find include file while running code gen
#798 [NSE-795] Fix a consecutive SMJ issue in wscg
#799 [NSE-791] fix xchg reuse in Spark321
#773 [NSE-770] [NSE-774] Fix runtime issues on spark 3.2
#787 [NSE-774] Fallback broadcast exchange for DPP to reuse
#763 [NSE-762] Add complex types support for ColumnarSortExec
#783 [NSE-782] prepare 1.3.1 release
#777 [NSE-732]Adding new config to enable/disable complex data type support
#776 [NSE-770] [NSE-774] Fix runtime issues on spark 3.2
#765 [NSE-764] declare java.version for maven
#767 [NSE-610] fix unit tests on SHJ
#760 [NSE-759] Add spark 3.1.2 & 3.1.3 as supported versions for 3.1.1 shim layer
#757 [NSE-746]Fix memory allocation in row to columnar
#724 [NSE-725] change the code style for ExecutorManger
#751 [NSE-745] Improve codegen check for expression
#742 [NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2
#754 [NSE-755] Quick fix for ConverterUtils.getAttrFromExpr for TPCDS queries
#749 [NSE-732] Support Map complex type in Shuffle
#738 [NSE-610] hashjoin opt1
#733 [NSE-732] Support Struct complex type in Shuffle
#744 [NSE-740] fix codegen with out_of_range check
#743 [NSE-740] Catch out_of_range exception in casting string to numeric types in wscg
#735 [NSE-610] hashagg opt#2
#707 [NSE-710] Add rand expression support
#734 [NSE-727] Create table failed with TPCH partiton dataset, patch 2
#715 [NSE-610] hashagg opt#1
#731 [NSE-727] Create table failed with TPCH partiton dataset
#713 [NSE-712] Optimize Array split and support nested Array
#721 [NSE-719][backport]fix null check in SMJ
#720 [NSE-719] fix null check in SMJ
#718 Following NSE-702, fix for AQE enabled case
#691 [NSE-687]Try to upgrade log4j
#703 [NSE-702] Print output datatype for columnar shuffle on WebUI
#706 [NSE-705] Fallback R2C on unsupported cases
#657 [NSE-635] Add document to clarify incompatibility issues in expressions
#623 [NSE-602] Fix Array type shuffle split segmentation fault
#693 [NSE-692] JoinBenchmark is broken

OAP MLlib

Features

#189 Intel-MLlib not support spark-3.2.1 version
#186 [Core] Support CDH versions
#187 Intel-MLlib not support spark-3.1.3 version.
#180 [CI] Refactor CI and add code checks

Bugs Fixed

#202 [SDLe] Update oneAPI version to solve vulnerabilities
#171 [Core] detect if spark.dynamicAllocation.enabled is set true and exit gracefully
#185 [Naive Bayes]Big dataset will out of memory errors.
#184 [Core] Fix code style issues
#179 [GPU][PCA] use distributed covariance as the first step for PCA
#178 [ALS] Fix error when converting buffer to CSRNumericTable
#177 [Native Bayes] Fix error when converting Vector to CSRNumericTable

PRs

#203 [ML-202] Update oneAPI Base Toolkit version and prepare for OAP 1.3.1 release
#197 [ML-187]Support spark 3.1.3 and 3.2.0 and support CDH
#201 [ML-171]When enabled oap mllib, spark.dynamicAllocation.enabled should be set false.
#196 [ML-185]Select label and features columns and cache data
#195 [ML-184]Fix code style issues
#183 [ML-180][CI] Refactor CI and add code checks
#175 [ML-179][GPU] use distributed covariance as the first step for PCA
#182 [ML-178]fix als convert buffer to NumericTable
#176 [ML-177][Native Bayes] Fix error when converting Vector to CSRNumericTable

Release 1.3.0

Gazelle Plugin

Features

#550 [ORC] Support ORC Format Reading
#188 Support Dockerfile
#574 implement native LocalLimit/GlobalLimit
#684 BufferedOutputStream causes massive futex system calls
#465 Provide option to rely on JVM GC to release Arrow buffers in Java
#681 Enable gazelle to support two math expressions: ceil & floor
#651 Set Hadoop 3.2 as default in pom.xml
#126 speed up codegen
#596 [ORC] Verify whether ORC file format supported complex data types in gazelle
#581 implement regex/trim/split expr
#473 Optimize the ArrowColumnarToRow performance
#647 Leverage buffered write in shuffle
#674 Add translate expression support
#675 Add instr expression support
#645 Add support to cast data in bool type to bigint type or string type
#463 version bump on 1.3
#583 implement get_json_object
#640 Disable compression for tiny payloads in shuffle
#631 Do not write schema in shuffle writting
#609 Implement date related expression like to_date, date_sub
#629 Improve codegen failure handling
#612 Add metric "prepare time" for shuffle writer
#576 columnar FROM_UNIXTIME
#589 [ORC] Add TPCDS and TPCH UTs for ORC Format Reading
#537 Increase partition number adaptively for large SHJ stages
#580 document how to create metadata for data source V1 based testing
#555 support batch size > 32k
#561 document the code generation behavior on driver, suggest to deploy driver on powerful server
#523 Support ArrayType in ArrowColumnarToRow operator
#542 Add rule to propagate local window for rank + filter pattern
#21 JNI: Unexpected behavior when executing codes after calling JNIEnv::ThrowNew
#512 Add strategy to force use of SHJ
#518 Arrow buffer cleanup: Support both manual release and auto release as a hybrid mode
#525 Support AQE in columnWriter
#516 Support External Sort in sort kernel
#503 能提供下官网性能测试的详细配置吗?
#501 Remove ArrowRecordBatchBuilder and its usages
#461 Support ArrayType in Gazelle
#479 Optimize sort materialization
#449 Refactor sort codegen kernel
#667 1.3 RC release
#352 Map/Array/Struct type support for Parquet reading in Arrow Data Source

Bugs Fixed

#660 support string builder in window output
#636 Remove log4j 1.2 Support for security issue
#540 reuse subquery in TPC-DS Q14a
#687 log4j 1.2.17 in spark-core
#617 Exceptions handling for stoi, stol, stof, stod in whole stage code gen
#653 Handle special cases for get_json_object in WSCG
#650 Scala test ArrowColumnarBatchSerializerSuite is failing
#642 Fail to cast unresolved reference to attribute reference
#599 data source unit tests are broken
#604 Sort with special projection key broken
#627 adding security instructions
#615 An excpetion in trying to cast attribute in getResultAttrFromExpr of ConverterUtils
#588 preallocated memory for shuffle split
#606 NullpointerException getting map values from ArrowWritableColumnVector
#569 CPU overhead on fine grain / concurrent off-heap acquire operations
#553 Support casting string type to types like int, bigint, float, double
#514 Fix the core dump issue in Q93 when enable columnar2row
#532 Fix the failed UTs in ArrowEvalPythonExecSuite when enable ArrowColumnarToRow
#534 Columnar SHJ: Error if probing with empty record batch
#529 Wrong build side may be chosen for SemiJoin when forcing use of SHJ
#504 Fix non-decimal window function unit test failures
#493 Three unit tests newly failed on master branch

PRs

#690 [NSE-667] backport patches to 1.3 branch
#688 [NSE-687]remove exclude log4j when running ut
#686 [NSE-400] Fix the bug for negative decimal data
#685 [NSE-684] BufferedOutputStream causes massive futex system calls
#680 [NSE-667] backport patches to 1.3 branch
#683 [NSE-400] fix leakage in row to column operator
#637 [NSE-400] Native Arrow Row to columnar support
#648 [NSE-647] Leverage buffered write in shuffle
#682 [NSE-681] Add floor & ceil expression support
#672 [NSE-674] Add translate expression support
#676 [NSE-675] Add instr expression support
#652 [NSE-651]Use Hadoop 3.2 as default hadoop.version
#666 [NSE-667] backport patches to 1.3 branch
#644 [NSE-645] Add support to cast bool type to bigint type & string type
#659 [NSE-650] Scala test ArrowColumnarBatchSerializerSuite is failing
#649 [NSE-660] fix window builder with string
#655 [NSE-617] Handle exception in cast expression from string to numeric types in WSCG
#654 [NSE-653] Add validity checking for get_json_object in WSCG
#641 [NSE-640] Disable compression for tiny payloads in shuffle
#646 [NSE-636]Remove log4j1 related unit tests
#488 [NSE-463] version bump to 1.3.0-SNAPSHOT
#639 [NSE-126] improve codegen with pre-compiled header
#638 [NSE-642] Correctly get ResultAttrFromExpr for sql with 'case when IN/AND/OR'
#632 [NSE-631] Do not write schema in shuffle writting
#633 [NSE-601] Fix an issue in the case of group by coalesce
#630 [NSE-629] improve codegen failure handling
#622 [NSE-609] Complete to_date expression support
#628 [NSE-627] Doc: adding security readme
#624 [NSE-609] Add support for date_sub expression
#619 [NSE-583] impl get_json_object in wscg
#614 [NSE-576] Support from_unixtime expression in the case that 'yyyyMMdd' format is required
#616 [NSE-615] Add tackling for ColumnarEqualTo type in getResultAttrFromExpr of ConverterUtils
#613 [NSE-612] Add metric "prepare time" for shuffle writer
#608 [NSE-602] don't enable columnar shuffle on unsupported data types
#601 [NSE-604] fix sort w/ proj keys
#607 [NSE-606] NullpointerException getting map values from ArrowWritableC…
#584 [NSE-583] implement get_json_object
#595 [NSE-576] fix from_unixtime
#582 [NSE-581]impl regexp_replace
#594 [NSE-588] config the pre-allocated memory for shuffle's splitter
#600 [NSE-599] fix datasource unit tests
#597 [NSE-596] Add complex data types validation for ORC file format in gazelle
#590 [NSE-569] CPU overhead on fine grain / concurrent off-heap acquire operations
#586 [NSE-581] Add trim, left trim, right trim support in expression
#578 [NSE-589] Add TPCDS and TPCH suite for Orc fileformat
#538 [NSE-537] Increase partition number adaptively for large SHJ stages
#587 [NSE-580] update doc on data source(DS V1/V2 usage)
#575 [NSE-574]implement columnar limit
#556 [NSE-555] using 32bit selection vector
#577 [NSE-576] implement columnar from_unixtime
#572 [NSE-561] refine docs on sample configurations and code generation behavior
#552 [NSE-553] Complete the support to cast string type to types like int, bigint, float, double
#543 [NSE-540] enable reuse subquery
#554 [NSE-207] change the fallback condition for Columnar Like
#559 [NSE-352] Map/Array/Struct type support for Parquet reading in ArrowData Source
#551 [NSE-550] Support ORC Format Reading in Gazelle
#545 [NSE-542] Add rule to propagate local window for rank + filter pattern
#541 [NSE-207] improve the fix for join optimization
#495 [NSE-207] Fix NaN in Max and Min
#533 [NSE-532] Fix the failed UTs in ArrowEvalPythonExecSuite when enable ArrowColumnarToRow
#536 [NSE-207] Ignore tests causing test stop
#535 [NSE-534] Columnar SHJ: Error if probing with empty record batch
#531 [NSE-21] JNI: Unexpected behavior when executing codes after calling JNIEnv::ThrowNew
#530 [NSE-529] Wrong build side may be chosen for SemiJoin when forcing use of SHJ
#524 [NSE-523] Support ArrayType in ArrowColumnarToRow optimization
#513 [NSE-512] Add strategy to force use of SHJ
#519 [NSE-518] Arrow buffer cleanup: Support both manual release and auto …
#526 [NSE-525]Support AQE for ColumnarWriter
#517 [NSE-516]Support ExternalSorter to control memory footage
#515 [NSE-514] Fix the core dump issue in Q93 with V2 test
#509 Update README.md for performance result.
#511 [NSE-207] fix full UT test
#502 [NSE-501] Remove ArrowRecordBatchBuilder and its usages
#507 Previous PR removed this UT, fix here
#496 [NSE-461]columnar shuffle support for ArrayType
#480 [NSE-479] optimize sort materialization
#474 [NSE-473]Optimize ArrowColumnarToRow performance
#505 [NSE-504] Fix non-decimal window function unit test
#497 [NSE-493] Three unit tests newly failed on master branch (Python UDF Unit Tests)
#466 [NSE-465] POC release memory using GC
#462 [NSE-461][WIP] Support ArrayType in ArrowWritableColumnVector and ColumarPandasUDF
#450 [NSE-449] Refactor codegen sort kernel
#471 [NSE-207] Enabling UT report
#445 [NSE-444]Support ArrowColumnarToRowExec when the root plan is ColumnarToRowExec
#447 [NSE-207] Fix date and timestamp functions

OAP MLlib

Features

#158 [GPU] Add convertToSyclHomogen for row merged table for kmeans and pca
#149 [GPU] Add check-gpu utility
#140 [Core] Refactor and support multiple Spark versions in single JAR
#137 [Core] Multiple improvements for build & deploy and integrate oneAPI 2021.4
#133 [Correlation] Add Correlation algorithm
#125 [GPU] Update for Kmeans and PCA

Bugs Fixed

#161 [SDLe][Snyk] Log4j 1.2.* issues brought from Spark when scanning 3rd-party components for vulnerabilities
#155 [POM] Update scala version to 2.12.15
#135 [Core] Fix ccl::gather and Add ccl::gatherv

PRs

#162 [ML-161] Excluding log4j 1.x dependency from Spark core to avoid log4…
#159 [GPU] Add convertToSyclHomogen for row merged table for kmeans and pca
#157 [ML-155] [POM] Update scala version to 2.12.15
#150 [ML-149][GPU] Add check-gpu utility
#144 [ML-151] enable Summarizer with OAP
#141 [Core] Refactor and support multiple Spark versions in single JAR
#139 [ML-137] [Core] Multiple improvements for build & deploy and integrate oneAPI 2021.4
#127 [ML-133][Correlation] Add Correlation algorithm
#126 [ML-125][GPU] Update for Kmeans and PCA

Release 1.2.0

Gazelle Plugin

Features

#394 Support ColumnarArrowEvalPython operator
#368 Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0
#375 Implement a series of datetime functions
#183 Add Date/Timestamp type support
#362 make arrow-unsafe allocator as the default
#343 configurable codegen opt level
#333 Arrow Data Source: CSV format support fix
#223 Add Parquet write support to Arrow data source
#320 Add build option to enable unsafe Arrow allocator
#337 UDF: Add test case for validating basic row-based udf
#326 Update Scala unit test to spark-3.1.1

Performance

#400 Optimize ColumnarToRow Operator in NSE.
#411 enable ccache on C++ code compiling

Bugs Fixed

#358 Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds
#481 JVM heap memory leak on memory leak tracker facilities
#436 Fix for Arrow Data Source test suite
#317 persistent memory cache issue
#382 Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc
#384 ColumnarBatchScanExec reading parquet failed on java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#370 Failed to get time zone: NoSuchElementException: None.get
#360 Cannot compile master branch.
#341 build failed on v2 with -Phadoop-3.2

PRs

#489 [NSE-481] JVM heap memory leak on memory leak tracker facilities (Arrow Allocator)
#486 [NSE-475] restore coalescebatches operator before window
#482 [NSE-481] JVM heap memory leak on memory leak tracker facilities
#470 [NSE-469] Lazy Read: Iterator objects are not correctly released
#464 [NSE-460] fix decimal partial sum in 1.2 branch
#439 [NSE-433]Support pre-built Jemalloc
#453 [NSE-254] remove arrow-data-source-common from jar with dependency
#452 [NSE-254]Fix redundant arrow library issue.
#432 [NSE-429] TPC-DS Q14a/b get slowed down within setting spark.oap.sql.columnar.sortmergejoin.lazyread=true
#426 [NSE-207] Fix aggregate and refresh UT test script
#442 [NSE-254]Issue0410 jar size
#441 [NSE-254]Issue0410 jar size
#440 [NSE-254]Solve the redundant arrow library issue
#437 [NSE-436] Fix for Arrow Data Source test suite
#387 [NSE-383] Release SMJ input data immediately after being used
#423 [NSE-417] fix sort spill on inplsace sort
#416 [NSE-207] fix left/right outer join in SMJ
#422 [NSE-421]Disable the wholestagecodegen feature for the ArrowColumnarToRow operator
#369 [NSE-417] Sort spill support framework
#401 [NSE-400] Optimize ColumnarToRow Operator in NSE.
#413 [NSE-411] adding ccache support
#393 [NSE-207] fix scala unit tests
#407 [NSE-403]Add Dataproc integration section to README
#406 [NSE-404]Modify repo name in documents
#402 [NSE-368]Update emr-6.3.0 support
#395 [NSE-394]Support ColumnarArrowEvalPython operator
#346 [NSE-317]fix columnar cache
#392 [NSE-382]Support GCP Dataproc 2.0
#388 [NSE-382]Fix Hadoop version issue
#385 [NSE-384] "Select count(*)" without group by results in error: java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#374 [NSE-207] fix left anti join and support filter wo/ project
#376 [NSE-375] Implement a series of datetime functions
#373 [NSE-183] fix timestamp in native side
#356 [NSE-207] fix issues found in scala unit tests
#371 [NSE-370] Failed to get time zone: NoSuchElementException: None.get
#347 [NSE-183] Add Date/Timestamp type support
#363 [NSE-362] use arrow-unsafe allocator by default
#361 [NSE-273] Spark shim layer infrastructure
#364 [NSE-360] fix ut compile and travis test
#264 [NSE-207] fix issues found from join unit tests
#344 [NSE-343]allow to config codegen opt level
#342 [NSE-341] fix maven build failure
#324 [NSE-223] Add Parquet write support to Arrow data source
#321 [NSE-320] Add build option to enable unsafe Arrow allocator
#299 [NSE-207] fix unsuppored types in aggregate
#338 [NSE-337] UDF: Add test case for validating basic row-based udf
#336 [NSE-333] Arrow Data Source: CSV format support fix
#327 [NSE-326] update scala unit tests to spark-3.1.1

OAP MLlib

Features

#110 Update isOAPEnabled for Kmeans, PCA & ALS
#108 Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading
#93 [GPU] Add GPU support for PCA
#101 [Release] Add version update scripts and improve scripts for examples
#76 Reorganize Spark version specific code structure
#82 [Tests] Add NaiveBayes test and refactors

Bugs Fixed

#119 [SDLe][Klocwork] Security vulnerabilities found by static code scan
#121 Meeting freeing memory issue after the training stage when using Intel-MLlib to run PCA and K-means algorithms.
#122 Cannot run K-means and PCA algorithm with oap-mllib on Google Dataproc
#123 [Core] Improve locality handling for native lib loading
#116 Cannot run ALS algorithm with oap-mllib thanks to the commit "2883d3447d07feb55bf5d4fee8225d74b0b1e2b1"
#114 [Core] Improve native lib loading
#94 Failed to run KMeans workload with oap-mllib in JLSE
#95 Some shared libs are missing in 1.1.1 release
#105 [Core] crash when libfabric version conflict
#98 [SDLe][Klocwork] Security vulnerabilities found by static code scan
#88 [Test] Fix ALS Suite "ALS shuffle cleanup standalone"
#86 [NaiveBayes] Fix isOAPEnabled and add multi-version support

PRs

#124 [ML-123][Core] Improve locality handling for native lib loading
#118 [ML-116] use getOneCCLIPPort and fix lib loading
#115 [ML-114] [Core] Improve native lib loading
#113 [ML-110] Update isOAPEnabled for Kmeans, PCA & ALS
#112 [ML-105][Core] Fix crash when libfabric version conflict
#111 [ML-108] Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading
#104 [ML-93][GPU] Add GPU support for PCA
#103 [ML-98] [Release] Clean Service.java code
#102 [ML-101] [Release] Add version update scripts and improve scripts for examples
#90 [ML-88][Test] Fix ALS Suite "ALS shuffle cleanup standalone"
#87 [ML-86][NaiveBayes] Fix isOAPEnabled and add multi-version support
#83 [ML-82] [Tests] Add NaiveBayes test and refactors
#75 [ML-53] [CPU] Add Linear & Ridge Regression
#77 [ML-76] Reorganize multiple Spark version support code structure
#68 [ML-55] [CPU] Add Naive Bayes
#64 [ML-42] [PIP] Misc improvements and refactor code
#62 [ML-30][Coding Style] Add code style rules & scripts for Scala, Java and C++

SQL DS Cache

Features

#155 reorg to support profile based multi spark version

Bugs Fixed

#190 The function of vmem-cache and guava-cache should not be associated with arrow.
#181 [SDLe]Vulnerabilities scanned by Snyk

PRs

#182 [SQL-DS-CACHE-181][SDLe]Fix Snyk code scan issues
#191 [SQL-DS-CACHE-190]put plasma detector in seperate object to avoid unnecessary dependency of arrow
#189 [SQL-DS-CACHE-188][POAE7-1253] improvement of fallback from plasma cache to simple cache
#157 [SQL-DS-CACHE-155][POAE7-1187]reorg to support profile based multi spark version

PMem Shuffle

Bugs Fixed

#46 Cannot run Terasort with pmem-shuffle of branch-1.2
#43 Rpmp cannot be compiled due to the lack of boost header file.

PRs

#51 [PMEM-SHUFFLE-50] Remove description about download submodules manually since they can be downloaded automatically.
#49 [PMEM-SHUFFLE-48] Fix the bug about mapstatus tracking and add more connections for metastore.
#47 [PMEM-SHUFFLE-46] Fix the bug that off-heap memory is over used in shuffle reduce stage.
#40 [PMEM-SHUFFLE-39] Fix the bug that pmem-shuffle without RPMP fails to pass Terasort benchmark due to latest patch.
#38 [PMEM-SHUFFLE-37] Add start-rpmp.sh and stop-rpmp.sh
#33 [PMEM-SHUFFLE-28]Add RPMP with HA support and integrate it with Spark3.1.1
#27 [PMEM-SHUFFLE] Change artifact name to make it compatible with naming…

Remote Shuffle

Bugs Fixed

#24 Enhance executor memory release

PRs

#25 [REMOTE-SHUFFLE-24] Enhance executor memory release

Release 1.1.1

Native SQL Engine

Features

#304 Upgrade to Arrow 4.0.0
#285 ColumnarWindow: Support Date/Timestamp input in MAX/MIN
#297 Disable incremental compiler in CI
#245 Support columnar rdd cache
#276 Add option to switch Hadoop version
#274 Comment to trigger tpc-h RAM test
#256 CI: do not run ram report for each PR

Bugs Fixed

#325 java.util.ConcurrentModificationException: mutation occurred during iteration
#329 numPartitions are not the same
#318 fix Spark 311 on data source v2
#311 Build reports errors
#302 test on v2 failed due to an exception
#257 different version of slf4j-log4j
#293 Fix BHJ loss if key = 0
#248 arrow dependency must put after arrow installation

PRs

#332 [NSE-325] fix incremental compile issue with 4.5.x scala-maven-plugin
#335 [NSE-329] fix out partitioning in BHJ and SHJ
#328 [NSE-318]check schema before reuse exchange
#307 [NSE-304] Upgrade to Arrow 4.0.0
#312 [NSE-311] Build reports errors
#272 [NSE-273] support spark311
#303 [NSE-302] fix v2 test
#306 [NSE-304] Upgrade to Arrow 4.0.0: Change basic GHA TPC-H test target …
#286 [NSE-285] ColumnarWindow: Support Date input in MAX/MIN
#298 [NSE-297] Disable incremental compiler in GHA CI
#291 [NSE-257] fix multiple slf4j bindings
#294 [NSE-293] fix unsafemap with key = '0'
#233 [NSE-207] fix issues found from aggregate unit tests
#246 [NSE-245]Adding columnar RDD cache support
#289 [NSE-206]Update installation guide and configuration guide.
#277 [NSE-276] Add option to switch Hadoop version
#275 [NSE-274] Comment to trigger tpc-h RAM test
#271 [NSE-196] clean up configs in unit tests
#258 [NSE-257] fix different versions of slf4j-log4j12
#259 [NSE-248] fix arrow dependency order
#249 [NSE-241] fix hashagg result length
#255 [NSE-256] do not run ram report test on each PR

SQL DS Cache

Features

#118 port to Spark 3.1.1

Bugs Fixed

#121 OAP Index creation stuck issue

PRs

#132 Fix SampleBasedStatisticsSuite UnitTest case
#122 [ sql-ds-cache-121] Fix Index stuck issues
#119 [SQL-DS-CACHE-118][POAE7-1130] port sql-ds-cache to Spark3.1.1

OAP MLlib

Features

#26 [PIP] Support Spark 3.0.1 / 3.0.2 and upcoming 3.1.1

PRs

#39 [ML-26] Build for different spark version by -Pprofile

PMem Spill

Features

#34 Support vanilla spark 3.1.1

PRs

#41 [PMEM-SPILL-34][POAE7-1119]Port RDD cache to Spark 3.1.1 as separate module

PMem Common

Features

#10 add -mclflushopt flag to enable clflushopt for gcc
#8 use clflushopt instead of clflush

PRs

#11 [PMEM-COMMON-10][POAE7-1010]Add -mclflushopt flag to enable clflushop…
#9 [PMEM-COMMON-8][POAE7-896]use clflush optimize version for clflush

PMem Shuffle

Features

#15 Doesn't work with Spark3.1.1

PRs

#16 [pmem-shuffle-15] Make pmem-shuffle support Spark3.1.1

Remote Shuffle

Features

#18 upgrade to Spark-3.1.1
#11 Support DAOS Object Async API

PRs

#19 [REMOTE-SHUFFLE-18] upgrade to Spark-3.1.1
#14 [REMOTE-SHUFFLE-11] Support DAOS Object Async API

Release 1.1.0

Native SQL Engine

Features

#261 ArrowDataSource: Add S3 Support
#239 Adopt ARROW-7011
#62 Support Arrow's Build from Source and Package dependency library in the jar
#145 Support decimal in columnar window
#31 Decimal data type support
#128 Support Decimal in Aggregate
#130 Support decimal in project
#134 Update input metrics during reading
#120 Columnar window: Reduce peak memory usage and fix performance issues
#108 Add end-to-end test suite against TPC-DS
#68 Adaptive compression select in Shuffle.
#97 optimize null check in codegen sort
#29 Support mutiple-key sort without codegen
#75 Support HashAggregate in ColumnarWSCG
#73 improve columnar SMJ
#51 Decimal fallback
#38 Supporting expression as join keys in columnar SMJ
#27 Support REUSE exchange when DPP enabled
#17 ColumnarWSCG further optimization

Performance

#194 Arrow Parameters Update when compiling Arrow
#136 upgrade to arrow 3.0
#103 reduce codegen in multiple-key sort
#90 Refine HashAggregate to do everything in CPP

Bugs Fixed

#278 fix arrow dep in 1.1 branch
#265 TPC-DS Q67 failed with memmove exception in native split code.
#280 CMake version check
#241 TPC-DS q67 failed for XXH3_hashLong_64b_withSecret.constprop.0+0x180
#262 q18 has different digits compared with vanilla spark
#196 clean up options for native sql engine
#224 update 3rd party libs
#227 fix vulnerabilities from klockwork
#237 Add ARROW_CSV=ON to default C++ build commands
#229 Fix the deprecated code warning in shuffle_split_test
#119 consolidate batch size
#217 TPC-H query20 result not correct when use decimal dataset
#211 IndexOutOfBoundsException during running TPC-DS Q2
#167 Cannot successfully run q.14a.sql and q14b.sql when using double format for TPC-DS workload.
#191 libarrow.so and libgandiva.so not copy into the tmp directory
#179 Unable to find Arrow headers during build
#153 Fix incorrect queries after enabled Decimal
#173 fix the incorrect result of q69
#48 unit tests for c++ are broken
#101 ColumnarWindow: Remove obsolete debug code
#100 Incorrect result in Q45 w/ v2 bhj threshold is 10MB sf500
#81 Some ArrowVectorWriter implementations doesn't implement setNulls method
#82 Incorrect result in TPCDS Q72 SF1536
#70 Duplicate IsNull check in codegen sort
#64 Memleak in sort when SMJ is disabled
#58 Issues when running tpcds with DPP enabled and AQE disabled
#52 memory leakage in columnar SMJ
#53 Q24a/Q24b SHJ tail task took about 50 secs in SF1500
#42 reduce columnar sort memory footprint
#40 columnar sort codegen fallback to executor side
#1 columnar whole stage codegen failed due to empty results
#23 TPC-DS Q8 failed due to unsupported operation in columnar sortmergejoin
#22 TPC-DS Q95 failed due in columnar wscg
#4 columnar BHJ failed on new memory pool
#5 columnar BHJ failed on partitioned table with prefercolumnar=false

PRs

#288 [NSE-119] clean up on comments
#282 [NSE-280]fix cmake version check
#281 [NSE-280] bump cmake to 3.16
#279 [NSE-278]fix arrow dep in 1.1 branch
#268 [NSE-186] backport to 1.1 branch
#266 [NSE-265] Reserve enough memory before UnsafeAppend in builder
#270 [NSE-261] ArrowDataSource: Add S3 Support
#263 [NSE-262] fix remainer loss in decimal divide
#215 [NSE-196] clean up native sql options
#231 [NSE-176]Arrow install order issue
#242 [NSE-224] update third party code
#240 [NSE-239] Adopt ARROW-7011
#238 [NSE-237] Add ARROW_CSV=ON to default C++ build commands
#230 [NSE-229] Fix the deprecated code warning in shuffle_split_test
#225 [NSE-227]fix issues from codescan
#219 [NSE-217] fix missing decimal check
#212 [NSE-211] IndexOutOfBoundsException during running TPC-DS Q2
#187 [NSE-185] Avoid unnecessary copying when simply projecting on fields
#195 [NSE-194]Turn on several Arrow parameters
#189 [NSE-153] Following NSE-153, optimize fallback conditions for columnar window
#192 [NSE-191]Fix issue0191 for .so file copy to tmp.
#181 [NSE-179]Fix arrow include directory not include when using ARROW_ROOT
#175 [NSE-153] Fix window results
#174 [NSE-173] fix incorrect result of q69
#172 [NSE-62]Fixing issue0062 for package arrow dependencies in jar with refresh2
#171 [NSE-170]improve sort shuffle code
#165 [NSE-161] adding format check
#166 [NSE-130] support decimal round and abs
#164 [NSE-130] fix precision loss in divide w/ decimal type
#159 [NSE-31] fix SMJ divide with decimal
#156 [NSE-130] fix overflow and precision loss
#152 [NSE-86] Merge Arrow Data Source
#154 [NSE-153] Fix incorrect quries after enabled Decimal
#151 [NSE-145] Support decimal in columnar window
#129 [NSE-128]Support Decimal in Aggregate/HashJoin
#131 [NSE-130] support decimal in project
#107 [NSE-136]upgrade to arrow 3.0.0
#135 [NSE-134] Update input metrics during reading
#121 [NSE-120] Columnar window: Reduce peak memory usage and fix performance issues
#112 [NSE-97] optimize null check and refactor sort kernels
#109 [NSE-108] Add end-to-end test suite against TPC-DS
#69 [NSE-68][Shuffle] Adaptive compression select in Shuffle.
#98 [NSE-97] remove isnull when null count is zero
#102 [NSE-101] ColumnarWindow: Remove obsolete debug code
#105 [NSE-100]Fix an incorrect result error when using SHJ in Q45
#91 [NSE-90]Refactor HashAggregateExec and CPP kernels
#79 [NSE-81] add missing setNulls methods in ArrowWritableColumnVector
#44 [NSE-29]adding non-codegen framework for multiple-key sort
#76 [NSE-75]Support ColumnarHashAggregate in ColumnarWSCG
#83 [NSE-82] Fix Q72 SF1536 incorrect result
#72 [NSE-51] add more datatype fallback logic in columnar operators
#60 [NSE-48] fix c++ unit tests
#50 [NSE-45] BHJ memory leak
#74 [NSE-73]using data ref in multiple keys based SMJ
#71 [NSE-70] remove duplicate IsNull check in sort
#65 [NSE-64] fix memleak in sort when SMJ is disabled
#59 [NSE-58]Fix empty input issue when DPP enabled
#7 [OAP-1846][oap-native-sql] add more fallback logic
#57 [NSE-56]ColumnarSMJ: fallback on full outer join
#55 [NSE-52]Columnar SMJ: fix memory leak by closing stream batches properly
#54 [NSE-53]Partial fix Q24a/Q24b tail SHJ task materialization performance issue
#47 [NSE-17]TPCDS Q72 optimization
#39 [NSE-38]ColumnarSMJ: support expression as join keys
#43 [NSE-42] early release sort input
#33 [NSE-32] Use Spark managed spill in columnar shuffle
#41 [NSE-40] fixes driver failing to do sort codege
#28 [NSE-27]Reuse exchage to optimize DPP performance
#36 [NSE-1]fix columnar wscg on empty recordbatch
#24 [NSE-23]fix columnar SMJ fallback
#26 [NSE-22]Fix w/DPP issue when inside wscg smj both sides are smj
#18 [NSE-17] smjwscg optimization:
#3 [NSE-4]fix columnar BHJ on new memory pool
#6 [NSE-5][SCALA] Fix ColumnarBroadcastExchange didn't fallback issue w/ DPP

SQL DS Cache

Features

#36 HCFS doc for Spark
#38 update Plasma dependency for Plasma-based-cache module
#14 Add HCFS module
#17 replace arrow-plasma dependency for hcfs module

Bugs Fixed

#62 Upgrade hadoop dependencies in HCFS

PRs

#83 [SQL-DS-CACHE-82][SDLe]Upgrade Jetty version
#77 [SQL-DS-CACHE-62][POAE7-984] upgrade hadoop version to 3.3.0
#56 [SQL-DS-CACHE-47]Add plasma native get timeout
#37 [SQL-DS-CACHE-36][POAE7-898]HCFS docs for OAP 1.1
#39 [SQL-DS-CACHE-38][POAE7-892]update Plasma dependency
#18 [SQL-DS-CACHE-17][POAE7-905]replace intel-arrow with apache-arrow v3.0.0
#13 [SQL-DS-CACHE-14][POAE7-847] Port HCFS to OAP
#16 [SQL-DS-CACHE-15][POAE7-869]Refactor original code to make it a sub-module

OAP MLlib

Features

#35 Restrict printNumericTable to first 10 eigenvalues with first 20 dimensions
#33 Optimize oneCCL port detecting
#28 Use getifaddrs to get host ips for oneCCL kvs
#12 Improve CI and add pseudo cluster testing
#31 Print time duration for each PCA step
#13 Add ALS with new oneCCL APIs
#18 Auto detect KVS port for oneCCL to avoid port conflict
#10 Porting Kmeans and PCA to new oneCCL API

Bugs Fixed

#43 [Release] Error when installing intel-oneapi-dal-devel-2021.1.1 intel-oneapi-tbb-devel-2021.1.1
#46 [Release] Meet hang issue when running PCA algorithm.
#48 [Release] No performance benefit when using Intel-MLlib to run ALS algorithm.
#25 Fix oneCCL KVS port auto detect and improve logging

PRs

#51 [ML-50] Merge #47 and prepare for OAP 1.1
#49 Revert "[ML-41] Revert to old oneCCL and Prepare for OAP 1.1"
#47 [ML-44] [PIP] Update to oneAPI 2021.2 and Rework examples for validation
#40 [ML-41] Revert to old oneCCL and Prepare for OAP 1.1
#36 [ML-35] Restrict printNumericTable to first 10 eigenvalues with first 20 dimensions
#34 [ML-33] Optimize oneCCL port detecting
#20 [ML-12] Improve CI and add pseudo cluster testing
#32 [ML-31] Print time duration for each PCA step
#14 [ML-13] Add ALS with new oneCCL APIs
#24 [ML-25] Fix oneCCL KVS port auto detect and improve logging
#19 [ML-18] Auto detect KVS port for oneCCL to avoid port conflict

PMem Spill

Bugs Fixed

#22 [SDLe][Snyk]Upgrade Jetty version to fix vulnerability scanned by Snyk
#13 The compiled code failed because the variable name was not changed

PRs

#27 [PMEM-SPILL-22][SDLe]Upgrade Jetty version
#21 [POAE7-961] fix null pointer issue when offheap enabled.
#18 [POAE7-858] disable RDD cache related PMem intialization as default and add PMem related logic in SparkEnv
#19 [PMEM-SPILL-20][POAE7-912]add vanilla SparkEnv.scala for future update
#15 [POAE7-858] port memory extension options to OAP 1.1
#12 Change the variable name so that the passed parameters are correct
#10 Fixing one pmem path on AppDirect mode may cause the pmem initialization path to be empty Path

PMem Shuffle

Features

#7 Enable running in fsdax mode

Bugs Fixed

#10 [pmem-shuffle] There are potential issues reported by Klockwork.

PRs

#13 [PMEM-SHUFFLE-10] Fix potential issues reported by klockwork for branch 1.1.
#6 [PMEM-SHUFFLE-7] enable fsdax mode in pmem-shuffle

Remote Shuffle

Features

#6 refactor shuffle-daos by abstracting shuffle IO for supporting both synchronous and asynchronous DAOS Object API
#4 check-in remote shuffle based on DAOS Object API

Bugs Fixed

#12 [SDLe][Snyk]Upgrade org.mock-server:mockserver-netty to fix vulnerability scanned by Snyk

PRs

#13 [REMOTE-SHUFFLE-12][SDle][Snyk]Upgrade org.mock-server:mockserver-net…
#5 check-in remote shuffle based on DAOS Object API

Release 1.0.0

Features

#1823 [oap-native-sql][doc] Spark Native SQL Engine installation guide is obsolete and thus broken.
#1545 [oap-data-source][arrow] Add metric: output_batches
#1588 [OAP-CACHE] Make Parquet file splitable
#1337 [oap-cacnhe] Discard OAP data format
#1679 [OAP-CACHE]Remove the code related to reading and writing OAP data format
#1680 [OAP-CACHE]Decouple spark code includes FileFormatDataWriter, FileFormatWriter and OutputWriter
#1846 [oap-native-sql] spark sql unit test
#1811 [OAP-cache]provide one-step starting scripts like plasma-sever redis-server
#1519 [oap-native-sql] upgrade cmake
#1873 [oap-native-sql] Columnar shuffle split variable length use UnsafeAppend
#1835 [oap-native-sql] Support ColumnarBHJ to Build and Broadcast HashRelation in driver side
#1848 [OAP-CACHE]Decouple spark code include OneApplicationResource.scala
#1824 [OAP-CACHE]Decouple spark code includes DataSourceScanExec.scala.
#1838 [OAP-CACHE]Decouple spark code includes VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java
#1839 [oap-native-sql] Add prefetch to columnar shuffle split
#1756 [Intel MLlib] Add Kmeans "tolerance" support and test cases
#1818 [OAP-Cache]Make Spark webUI OAP Tab more user friendly
#1831 [oap-native-sql] ColumnarWindow: Support reusing same window spec in multiple functions
#1653 [SQL Data Source Cache]Consistency issue on "enable" and "enabled" configuration
#1765 [oap-native-sql] Support WSCG in nativesql
#1517 [oap-native-sql] implement SortMergeJoin
#1535 [oap-native-sql] Add ColumnarWindowExec
#1654 [oap-native-sql] Columnar shuffle TPCDS enabling
#1700 [oap-native-sql] Support inside join condition project
#1717 [oap-native-sql] support null in columnar literal and subquery
#1704 [oap-native-sql] Add ColumnarUnion and ColumnarExpand
#1647 [oap-native-sql] row to columnar for decimal
#1638 [oap-native-sql] adding full TPC-DS support
#1498 [oap-native-sql] stddev_samp support
#1547 [oap-native-sql] adding metrics for input/output batches

Performance

#1956 [OAP-MLlib]Cannot get 5x performance benefit comparing with vanilla spark.
#1955 [OAP-CACHE] Plasma shows lower performance comparing with vanilla spark.
#2023 [OAP-MLlib] Use oneAPI official release instead of beta versions
#1829 [oap-native-sql] Optimize columnar shuffle and option to use AVX512
#1734 [oap-native-sql] use non-codegen for sort with one key
#1706 [oap-native-sql] Optimize columnar shuffle write

Bugs Fixed

#2054 [OAP-MLlib] Faild run Intel mllib after updating the version of oneapi.
#2012 [SQL Data Source Cache] The task will be suspended when using plasma cache.
#1640 [SQL Data Source Cache] The task will be suspended when using plasma cache and starting 2 executors per worker.
#2028 [OAP-Cache]When using Plasma Spark webUI OAP Tab cache metrics are not right 
#1979 [SDLe][native-sql-engine] Issues from Static Code Analysis with Klocwork need to be fixed
#1938 [oap-native-sql] Stability test failed when running TPCH for 10 rounds.
#1924 [OAP-CACHE] Decouple hearbeat message and use conf to determine whether to report locailty information
#1937 [rpmem-shuffle] Cannot pass q64.sql of TPC-DS when enable RPmem shuffle.
#1951 [SDLe][PMem-Shuffle]Specify Scala version above 2.12.4 in pom.xml
#1921 [SDLe][rpmem-shuffle] The master branch and branch-1.0-spark-3.0 can't pass BDBA analysis with libsqlitejdbc dependency.
#1743 [oap-native-sql] Error not reported when creating CodeGenerator instance
#1864 [oap-native-sql] hash conflict in hashagg
#1934 [oap-native-sql] backport to 1.0
#1929 [oap-native-sql] memleak in non-codegen aggregate
#1907 [OAP-cache]Cannot find the class of redis-client
#1888 [oap-native-sql] Add hash collision check for all HashJoins and hashAggr
#1903 [oap-native-sql] BHJ related UT fix
#1881 [oap-native-sql] Fix split use avx512
#1742 [oap-native-sql] SortArraysToIndicesKernel: incorrect null ordering with multiple sort keys
#1553 [oap-native-sql] TPCH-Q7 fails in throughput tests
#1854 [oap-native-sql] Fix columnar shuffle file not deleted
#1844 [oap-native-sql] Fix columnar shuffle spilled file not deleted
#1580 [oap-native-sql] Hash Collision in multiple keys scenario
#1754 [Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1815 [oap-native-sql] Memory management: Error on task end if there are unclosed child allocators
#1808 [oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1806 [oap-native-sql] Fix Columnar Shuffle Memory Leak
#1783 [oap-native-sql] ColumnarWindow: Rank() returns wrong result when input row number >= 65536
#1776 [oap-native-sql] memory leakage in native code
#1760 [oap-native-sql] fix columnar sorting on string
#1733 [oap-native-sql]TPCH Q18 memory leakage
#1694 [oap-native-sql] TPC-H q15 failed for ConditionedProbeArraysVisitorImpl MakeResultIterator does not support dependency type other than Batch
#1682 [oap-native-sql] fix aggregate without codegen
#1707 [oap-native-sql] Fix collect batch metric
#1642 [oap-native-sql] Support expression key in Join
#1669 [oap-native-sql] TPCH Q1 results is not correct w/ hashagg codegen off
#1629 [oap-native-sql] clean up building steps
#1602 [oap-native-sql] rework copyfromjar function
#1599 [oap-native-sql] Columnar BHJ fail on TPCH-Q15
#1567 [oap-native-sql] Spark thrift-server does not honor LIBARROW_DIR env
#1541 [oap-native-sql] TreeNode children not replaced by columnar operators

PRs

#2056 [OAP-2054][OAP-MLlib] Fix oneDAL libJavaAPI.so packaging for oneAPI 2021.1 production release
#2039 [OAP-2023][OAP-MLlib] Switch to oneAPI 2021.1.1 official release for OAP 1.0
#2043 [OAP-1981][OAP-CACHE][POAE7-617]fix binary cache core dump issue
#2002 [OAP-2001][oap-native-sql]fix coding style
#2035 [OAP-2028][OAP-cache][POAE7-635] Fix set concurrent access bug
#2037 [OAP-1640][OAP-CACHE][POAE7-593]Fix plasma hang due to threshold
#2036 [OAP-1955][OAP-CACHE][POAE7-660]preferLocation low hit rate fix master branch
#2013 [OAP-CACHE][POAE7-628]port missing commits from branch 0.8/0.9
#2015 [OAP-2016] fix klocwork issues in oap-common/oap-spark
#2022 [OAP-1980][rpmem-shuffle] Fix Klockwork issues for spark3.x version
#2011 [OAP-2010][oap-native-sql] Add abs support in wscg
#1996 [OAP-1998][oap-native-sql] Add support to do numa binding for Columnar Operations
#2004 [OAP-2012][OAP-CACHE][POAE7-635]bug fix: plasma hang - use java thread-safe set
#1988 [OAP-1983][oap-native-sql] Fix Q38 and Q87 when unsafeRow contains null
#1976 [OAP-1983][oap-native-sql] Fix hashCheck performance issue
#1970 [OAP-1947][oap-native-sql][C++] reduce sort kernel memory footprint
#1961 [OAP-1924][OAP-CACHE]Decouple hearbeat message and use conf to determine whether to report locailty information for branch branch-1.0-spark-3.x
#1982 [OAP-1981][OAP-CACHE][POAE7-617]Bug fix binary docache
#1952 [OAP-1951][PMem-Shuffle][SDLe]Specify Scala version in pom.xml
#1919 [OAP-1918][OAP-CACHE][POAE7-563]bug fix: plasma get an invalid value
#1589 [OAP-1588][OAP-CACHE][POAE7-363] Make Parquet splitable
#1954 [OAP-1884][OAP-dev]Small fix for arrow build in prepare_oap_env.sh.
#1933 [OAP-1934][oap-native-sql]Backport NativeSQL code to 1.0
#1889 [OAP-1888][oap-native-sql]Add hash collision check for all HashJoins and hashAggr
#1904 [OAP-1903][oap-native-sql] Fix Local Mode BHJ related UT fail issue
#1916 [OAP-1846][oap-native-sql] clean up travis test
#1923 [OAP-1921][rpmem-shuffle] For BDBA analysis to exclude unused library
#1890 [OAP-1846][oap-native-sql] add script for running unit test
#1905 [OAP-1813][POAE7-555] [OAP-CACHE] package redis related dependency
#1908 [OAP-1884][OAP-dev]Add cxx-compiler in oap conda recipes for native-sql.
#1901 [OAP-1884][OAP-dev]Add c-compiler in oap conda recipes for native-sql.
#1895 [OAP-1884][OAP-dev] Checkout arrow branch in case arrow in other branch
#1876 [OAP-1875]Generating changelog automatically for new releases
#1812 [OAP-1811][OAP-cache][POAE7-486]add sbin folder
#1882 [OAP-1881][oap-native-sql] Fix split use avx512
#1847 [OAP-1846][oap-native-sql] add unit tests from spark to native sql
#1836 [OAP-1835][oap-native-sql] Support ColumnarBHJ to build and broadcast hashrelation
#1885 [OAP-1884][OAP-dev]Add oap-mllib to parent pom and fix error when git clone oneccl.
#1868 [OAP-1653][OAP-Cache]Modify enabled and enable compatibility check
#1853 [OAP-1852][oap-native-sql] Memory Management: Use Arrow C++ memory po…
#1859 [OAP-1858][OAP-cache][POAE7-518] Decouple FilePartition.scala
#1857 [OAP-1833][oap-native-sql] Fix HashAggr hasNext won't stop issue
#1855 [OAP-1854][oap-native-sql] Fix columnar shuffle file not deleted
#1840 [OAP-1839][oap-native-sql] Add prefetch to columnar shuffle split
#1843 [OAP-1842][OAP-dev]Add arrow conda build action job.
#1849 [OAP-1848][SQL Data Source Cache] Decouple OneApplicationResource.scala
#1837 [OAP-1838][SQL Data Source Cache] Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java.
#1757 [OAP-1756][Intel MLlib] Add Kmeans "tolerance" support and test cases
#1845 [OAP-1844][oap-native-sql] Fix columnar shuffle spilled file not deleted
#1827 [OAP-1818][SQL-Data-Source-Cache]Modify Spark webUI OAP Tab expressio…
#1832 [OAP-1831][oap-native-sql] ColumnarWindow: Support reusing same windo…
#1834 [OAP-1833][oap-native-sql][Scala] fix CoalesceBatchs after HashAgg
#1830 [OAP-1829][oap-native-sql] Optimize columnar shuffle and option to use AVX-512
#1803 [OAP-1751][oap-native-sql]fix sort on TPC-DS
#1755 [OAP-1754][Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1826 [OAP-1825] disable pmemblk test
#1802 [OAP-1653][OAP-Cache]Keep consistency on 'enabled' of OapConf configu…
#1810 [OAP-1771]Fix README for Arrow Data Source
#1816 [OAP-1815][oap-native-sql] Memory management: Error on task end if th…
#1809 [OAP-1808][oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1467 [OAP-1457][oap-native-sql] Reserve Spark off-heap execution memory after buffer allocation
#1807 [OAP-1806][oap-native-sql] Fix Columnar Shuffle Memory Leak
#1788 [OAP-1765][oap-native-sql] Fix for dropped CoalecseBatches before ColumnarBroadcastExchange
#1799 [OAP-CACHE][OAP-1690][POAE7-430] Cache backend fall back detect bug fix branch master
#1744 [OAP-CACHE][OAP-1748][POAE7-462] Enable externalDB to store CacheMetaInfo branch master
#1787 [OAP-1786][oap-native-sql] ColumnarWindow: Avoid unnecessary mem copies
#1773 [POAE7-471]Handle oap-common build issue about PMemKV
#1782 [OAP-1631]Update compile scripts from 0.9
#1785 [OAP-1765][oap-native-sql] Support WSCG for nativesql(PART 2)
#1781 [OAP-1765][oap-native-sql] fix codegen for SMJ and HashAgg
#1775 [OAP-1776][oap-native-sql]fix sort memleak
#1766 [OAP-1765][oap-native-sql] Support WSCG for nativesql and use non-codegen join for remainings
#1774 [OAP-1631]Add prepare_oap_env.sh.
#1769 [OAP-1768][POAE7-163][OAP-SPARK] Integrate block manager with chunk api
#1763 [OAP-1759][oap-native-sql] ColumnarWindow: Add execution metrics
#1656 [OAP-1517][oap-native-sql] Improve SortMergeJoin Part2
#1761 [oap-native-sql] quick fix sort on string by fallback to row
#1536 [OAP-1535][oap-native-sql] Add ColumnarWindowExec
#1735 [OAP-1734][oap-native-sql]use non-codegen for sort with single key
#1747 [OAP-1741][rpmem-shuffle]To make java side load native library from jar directly
#1725 [OAP-1727][POAE7-358] Spark integration: Memory Spill to PMem
#1738 [OAP-1733][oap-native-sql][Scala] fix mem leak
#1701 [OAP-1700][oap-native-sql] support join-inside condition project
#1736 [oap-1727][POAE7-358] Add native spark files for memory spill module
#1719 [oap-common][POAE7-347]Stream API for PMem storage store
#1723 [OAP-1679][OAP-CACHE] Remove the code related to reading and writing OAP data format
#1716 [OAP-1717][oap-native-sql] support null in columnar literal and subquery
#1713 [OAP-1712] [OAP-SPARK] Remove file change list from dev directory
#1711 [OAP-1694][oap-native-sql][Scala] fix hash join w/ empty batch
#1710 [OAP-1706][oap-native-sql] Optimize shuffle write
#1705 [OAP-1704][oap-native-sql] Support ColumnarUnion and ColumnarExpand
#1683 [OAP-1682][oap-native-sql] fix aggregate without codegen
#1708 [OAP-1707][oap-native-sql] Fix collect batch metric
#1675 [OAP-1651][oap-native-sql] Adding fallback rules for join/shuffle
#1674 [OAP-1673][oap-native-sql] Adding native double round function
#1632 [OAP-1631][Doc] Add Commit Message Requirements
#1672 [OAP-1610][Intel-MLlib]Upgrade the mahout-hdfs to version 14.1
#1641 [OAP-1651][OAP-1642][oap-native-sql] support TPCDS w/ AQE
#1670 [OAP-1669][oap-native-sql] use distinct ordinal list
#1655 [OAP-1654][oap-native-sql]Columnar shuffle tpcds enabling
#1630 [OAP-1629][oap-native-sql] clean up building scripts
#1601 [OAP-1602][oap-native-sql][Java] fix exract resource from jar
#1639 [OAP-1638][oap-native-sql] tpcds enabling (part2)
#1586 [OAP-1587][oap-native-sql] tpcds enabling (part1)
#1600 [oap-1599][oap-native-sql][Scala] fix broadcasthashjoin
#1555 [OAP-1541][oap-native-sql] TreeNode children not replaced by columnar…
#1546 [OAP-1547][oap-native-sql][Scala] Adding metrics for input/output batches
#1472 [OAP-1466] [RDD Cache] [POAE-354] Initialize pmem with AppDirect and KMemDax mode in block manager

Release 0.8.4

Features

#1865 [OAP-CACHE]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1813 [OAP-cache] package redis client jar into oap-cache

Bugs Fixed

#2044 [OAP-CACHE] Build error due to synchronizedSet on branch 0.8
#2027 [oap-shuffle] Should load native library from jar directly
#1981 [OAP-CACHE] Error runing q32 binary cache
#1980 [SDLe][RPMem-Shuffle]Issues from Static Code Analysis with Klocwork need to be fixed
#1918 [OAP-CACHE] Plasma throw exception:get an invalid value- branch 0.8

PRs

#2045 [OAP-2044][OAP-CACHE]bug fix: build error due to synchronizedSet
#2031 [OAP-1955][OAP-CACHE][POAE7-667]preferLocation low hit rate fix branch 0.8
#2029 [OAP-2027][rpmem-shuffle] Load native libraries from jar
#2018 [OAP-1980][SDLe][rpmem-shuffle] Fix potential risk issues reported by Klockwork
#1920 [OAP-1924][OAP-CACHE]Decouple hearbeat message and use conf to determine whether to report locailty information
#1949 [OAP-1948][rpmem-shuffle] Fix several vulnerabilities reported by BDBA
#1900 [OAP-1680][OAP-CACHE] Decouple FileFormatDataWriter, FileFormatWriter and OutputWriter
#1899 [OAP-1679][OAP-CACHE] Remove the code related to reading and writing OAP data format (#1723)
#1897 [OAP-1884][OAP-dev] Update memkind version and copy arrow plasma jar to conda package build path
#1883 [OAP-1568][OAP-CACHE] Cleanup Oap data format read/write related test cases
#1863 [OAP-1865][SQL Data Source Cache]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1841 [OAP-1579][OAP-cache]Fix web UI to show cache size
#1814 [OAP-cache][OAP-1813][POAE7-481]package redis client related dependency
#1790 [OAP-CACHE][OAP-1690][POAE7-430] Cache backend fallback bugfix
#1740 [OAP-CACHE][OAP-1748][POAE7-453]Enable externalDB to store CacheMetaInfo branch 0.8
#1731 [OAP-CACHE] [OAP-1730] [POAE-428] Add OAP cache runtime enable