diff --git a/CHANGELOG.md b/CHANGELOG.md index 60fd14471..7feb16f7d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,197 @@ # Change log -Generated on 2021-06-02 +Generated on 2021-09-02 + +## Release 1.2.0 + +### Gazelle Plugin + +#### Features +||| +|:---|:---| +|[#394](https://github.com/oap-project/gazelle_plugin/issues/394)|Support ColumnarArrowEvalPython operator | +|[#368](https://github.com/oap-project/gazelle_plugin/issues/368)|Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0| +|[#375](https://github.com/oap-project/gazelle_plugin/issues/375)|Implement a series of datetime functions| +|[#183](https://github.com/oap-project/gazelle_plugin/issues/183)|Add Date/Timestamp type support| +|[#362](https://github.com/oap-project/gazelle_plugin/issues/362)|make arrow-unsafe allocator as the default| +|[#343](https://github.com/oap-project/gazelle_plugin/issues/343)|configurable codegen opt level| +|[#333](https://github.com/oap-project/gazelle_plugin/issues/333)|Arrow Data Source: CSV format support fix| +|[#223](https://github.com/oap-project/gazelle_plugin/issues/223)|Add Parquet write support to Arrow data source| +|[#320](https://github.com/oap-project/gazelle_plugin/issues/320)|Add build option to enable unsafe Arrow allocator| +|[#337](https://github.com/oap-project/gazelle_plugin/issues/337)|UDF: Add test case for validating basic row-based udf| +|[#326](https://github.com/oap-project/gazelle_plugin/issues/326)|Update Scala unit test to spark-3.1.1| + +#### Performance +||| +|:---|:---| +|[#400](https://github.com/oap-project/gazelle_plugin/issues/400)|Optimize ColumnarToRow Operator in NSE.| +|[#411](https://github.com/oap-project/gazelle_plugin/issues/411)|enable ccache on C++ code compiling| + +#### Bugs Fixed +||| +|:---|:---| +|[#358](https://github.com/oap-project/gazelle_plugin/issues/358)|Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds| +|[#481](https://github.com/oap-project/gazelle_plugin/issues/481)|JVM heap memory leak on memory leak tracker facilities| +|[#436](https://github.com/oap-project/gazelle_plugin/issues/436)|Fix for Arrow Data Source test suite| +|[#317](https://github.com/oap-project/gazelle_plugin/issues/317)|persistent memory cache issue| +|[#382](https://github.com/oap-project/gazelle_plugin/issues/382)|Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc| +|[#384](https://github.com/oap-project/gazelle_plugin/issues/384)|ColumnarBatchScanExec reading parquet failed on java.lang.IllegalArgumentException: not all nodes and buffers were consumed| +|[#370](https://github.com/oap-project/gazelle_plugin/issues/370)|Failed to get time zone: NoSuchElementException: None.get| +|[#360](https://github.com/oap-project/gazelle_plugin/issues/360)|Cannot compile master branch.| +|[#341](https://github.com/oap-project/gazelle_plugin/issues/341)|build failed on v2 with -Phadoop-3.2| + +#### PRs +||| +|:---|:---| +|[#489](https://github.com/oap-project/gazelle_plugin/pull/489)|[NSE-481] JVM heap memory leak on memory leak tracker facilities (Arrow Allocator)| +|[#486](https://github.com/oap-project/gazelle_plugin/pull/486)|[NSE-475] restore coalescebatches operator before window| +|[#482](https://github.com/oap-project/gazelle_plugin/pull/482)|[NSE-481] JVM heap memory leak on memory leak tracker facilities| +|[#470](https://github.com/oap-project/gazelle_plugin/pull/470)|[NSE-469] Lazy Read: Iterator objects are not correctly released| +|[#464](https://github.com/oap-project/gazelle_plugin/pull/464)|[NSE-460] fix decimal partial sum in 1.2 branch| +|[#439](https://github.com/oap-project/gazelle_plugin/pull/439)|[NSE-433]Support pre-built Jemalloc| +|[#453](https://github.com/oap-project/gazelle_plugin/pull/453)|[NSE-254] remove arrow-data-source-common from jar with dependency| +|[#452](https://github.com/oap-project/gazelle_plugin/pull/452)|[NSE-254]Fix redundant arrow library issue.| +|[#432](https://github.com/oap-project/gazelle_plugin/pull/432)|[NSE-429] TPC-DS Q14a/b get slowed down within setting spark.oap.sql.columnar.sortmergejoin.lazyread=true| +|[#426](https://github.com/oap-project/gazelle_plugin/pull/426)|[NSE-207] Fix aggregate and refresh UT test script| +|[#442](https://github.com/oap-project/gazelle_plugin/pull/442)|[NSE-254]Issue0410 jar size| +|[#441](https://github.com/oap-project/gazelle_plugin/pull/441)|[NSE-254]Issue0410 jar size| +|[#440](https://github.com/oap-project/gazelle_plugin/pull/440)|[NSE-254]Solve the redundant arrow library issue| +|[#437](https://github.com/oap-project/gazelle_plugin/pull/437)|[NSE-436] Fix for Arrow Data Source test suite| +|[#387](https://github.com/oap-project/gazelle_plugin/pull/387)|[NSE-383] Release SMJ input data immediately after being used| +|[#423](https://github.com/oap-project/gazelle_plugin/pull/423)|[NSE-417] fix sort spill on inplsace sort| +|[#416](https://github.com/oap-project/gazelle_plugin/pull/416)|[NSE-207] fix left/right outer join in SMJ| +|[#422](https://github.com/oap-project/gazelle_plugin/pull/422)|[NSE-421]Disable the wholestagecodegen feature for the ArrowColumnarToRow operator| +|[#369](https://github.com/oap-project/gazelle_plugin/pull/369)|[NSE-417] Sort spill support framework| +|[#401](https://github.com/oap-project/gazelle_plugin/pull/401)|[NSE-400] Optimize ColumnarToRow Operator in NSE.| +|[#413](https://github.com/oap-project/gazelle_plugin/pull/413)|[NSE-411] adding ccache support| +|[#393](https://github.com/oap-project/gazelle_plugin/pull/393)|[NSE-207] fix scala unit tests| +|[#407](https://github.com/oap-project/gazelle_plugin/pull/407)|[NSE-403]Add Dataproc integration section to README| +|[#406](https://github.com/oap-project/gazelle_plugin/pull/406)|[NSE-404]Modify repo name in documents| +|[#402](https://github.com/oap-project/gazelle_plugin/pull/402)|[NSE-368]Update emr-6.3.0 support| +|[#395](https://github.com/oap-project/gazelle_plugin/pull/395)|[NSE-394]Support ColumnarArrowEvalPython operator| +|[#346](https://github.com/oap-project/gazelle_plugin/pull/346)|[NSE-317]fix columnar cache| +|[#392](https://github.com/oap-project/gazelle_plugin/pull/392)|[NSE-382]Support GCP Dataproc 2.0| +|[#388](https://github.com/oap-project/gazelle_plugin/pull/388)|[NSE-382]Fix Hadoop version issue| +|[#385](https://github.com/oap-project/gazelle_plugin/pull/385)|[NSE-384] "Select count(*)" without group by results in error: java.lang.IllegalArgumentException: not all nodes and buffers were consumed| +|[#374](https://github.com/oap-project/gazelle_plugin/pull/374)|[NSE-207] fix left anti join and support filter wo/ project| +|[#376](https://github.com/oap-project/gazelle_plugin/pull/376)|[NSE-375] Implement a series of datetime functions| +|[#373](https://github.com/oap-project/gazelle_plugin/pull/373)|[NSE-183] fix timestamp in native side| +|[#356](https://github.com/oap-project/gazelle_plugin/pull/356)|[NSE-207] fix issues found in scala unit tests| +|[#371](https://github.com/oap-project/gazelle_plugin/pull/371)|[NSE-370] Failed to get time zone: NoSuchElementException: None.get| +|[#347](https://github.com/oap-project/gazelle_plugin/pull/347)|[NSE-183] Add Date/Timestamp type support| +|[#363](https://github.com/oap-project/gazelle_plugin/pull/363)|[NSE-362] use arrow-unsafe allocator by default| +|[#361](https://github.com/oap-project/gazelle_plugin/pull/361)|[NSE-273] Spark shim layer infrastructure| +|[#364](https://github.com/oap-project/gazelle_plugin/pull/364)|[NSE-360] fix ut compile and travis test| +|[#264](https://github.com/oap-project/gazelle_plugin/pull/264)|[NSE-207] fix issues found from join unit tests| +|[#344](https://github.com/oap-project/gazelle_plugin/pull/344)|[NSE-343]allow to config codegen opt level| +|[#342](https://github.com/oap-project/gazelle_plugin/pull/342)|[NSE-341] fix maven build failure| +|[#324](https://github.com/oap-project/gazelle_plugin/pull/324)|[NSE-223] Add Parquet write support to Arrow data source| +|[#321](https://github.com/oap-project/gazelle_plugin/pull/321)|[NSE-320] Add build option to enable unsafe Arrow allocator| +|[#299](https://github.com/oap-project/gazelle_plugin/pull/299)|[NSE-207] fix unsuppored types in aggregate| +|[#338](https://github.com/oap-project/gazelle_plugin/pull/338)|[NSE-337] UDF: Add test case for validating basic row-based udf| +|[#336](https://github.com/oap-project/gazelle_plugin/pull/336)|[NSE-333] Arrow Data Source: CSV format support fix| +|[#327](https://github.com/oap-project/gazelle_plugin/pull/327)|[NSE-326] update scala unit tests to spark-3.1.1| + +### OAP MLlib + +#### Features +||| +|:---|:---| +|[#110](https://github.com/oap-project/oap-mllib/issues/110)|Update isOAPEnabled for Kmeans, PCA & ALS| +|[#108](https://github.com/oap-project/oap-mllib/issues/108)|Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading| +|[#93](https://github.com/oap-project/oap-mllib/issues/93)|[GPU] Add GPU support for PCA| +|[#101](https://github.com/oap-project/oap-mllib/issues/101)|[Release] Add version update scripts and improve scripts for examples| +|[#76](https://github.com/oap-project/oap-mllib/issues/76)|Reorganize Spark version specific code structure| +|[#82](https://github.com/oap-project/oap-mllib/issues/82)|[Tests] Add NaiveBayes test and refactors| + +#### Bugs Fixed +||| +|:---|:---| +|[#119](https://github.com/oap-project/oap-mllib/issues/119)|[SDLe][Klocwork] Security vulnerabilities found by static code scan| +|[#121](https://github.com/oap-project/oap-mllib/issues/121)|Meeting freeing memory issue after the training stage when using Intel-MLlib to run PCA and K-means algorithms.| +|[#122](https://github.com/oap-project/oap-mllib/issues/122)|Cannot run K-means and PCA algorithm with oap-mllib on Google Dataproc| +|[#123](https://github.com/oap-project/oap-mllib/issues/123)|[Core] Improve locality handling for native lib loading| +|[#116](https://github.com/oap-project/oap-mllib/issues/116)|Cannot run ALS algorithm with oap-mllib thanks to the commit "2883d3447d07feb55bf5d4fee8225d74b0b1e2b1"| +|[#114](https://github.com/oap-project/oap-mllib/issues/114)|[Core] Improve native lib loading| +|[#94](https://github.com/oap-project/oap-mllib/issues/94)|Failed to run KMeans workload with oap-mllib in JLSE| +|[#95](https://github.com/oap-project/oap-mllib/issues/95)|Some shared libs are missing in 1.1.1 release| +|[#105](https://github.com/oap-project/oap-mllib/issues/105)|[Core] crash when libfabric version conflict| +|[#98](https://github.com/oap-project/oap-mllib/issues/98)|[SDLe][Klocwork] Security vulnerabilities found by static code scan| +|[#88](https://github.com/oap-project/oap-mllib/issues/88)|[Test] Fix ALS Suite "ALS shuffle cleanup standalone"| +|[#86](https://github.com/oap-project/oap-mllib/issues/86)|[NaiveBayes] Fix isOAPEnabled and add multi-version support| + +#### PRs +||| +|:---|:---| +|[#124](https://github.com/oap-project/oap-mllib/pull/124)|[ML-123][Core] Improve locality handling for native lib loading| +|[#118](https://github.com/oap-project/oap-mllib/pull/118)|[ML-116] use getOneCCLIPPort and fix lib loading| +|[#115](https://github.com/oap-project/oap-mllib/pull/115)|[ML-114] [Core] Improve native lib loading| +|[#113](https://github.com/oap-project/oap-mllib/pull/113)|[ML-110] Update isOAPEnabled for Kmeans, PCA & ALS| +|[#112](https://github.com/oap-project/oap-mllib/pull/112)|[ML-105][Core] Fix crash when libfabric version conflict| +|[#111](https://github.com/oap-project/oap-mllib/pull/111)|[ML-108] Update PCA GPU, LiR CPU and Improve JAR packaging and libs loading| +|[#104](https://github.com/oap-project/oap-mllib/pull/104)|[ML-93][GPU] Add GPU support for PCA| +|[#103](https://github.com/oap-project/oap-mllib/pull/103)|[ML-98] [Release] Clean Service.java code| +|[#102](https://github.com/oap-project/oap-mllib/pull/102)|[ML-101] [Release] Add version update scripts and improve scripts for examples| +|[#90](https://github.com/oap-project/oap-mllib/pull/90)|[ML-88][Test] Fix ALS Suite "ALS shuffle cleanup standalone"| +|[#87](https://github.com/oap-project/oap-mllib/pull/87)|[ML-86][NaiveBayes] Fix isOAPEnabled and add multi-version support| +|[#83](https://github.com/oap-project/oap-mllib/pull/83)|[ML-82] [Tests] Add NaiveBayes test and refactors| +|[#75](https://github.com/oap-project/oap-mllib/pull/75)|[ML-53] [CPU] Add Linear & Ridge Regression| +|[#77](https://github.com/oap-project/oap-mllib/pull/77)|[ML-76] Reorganize multiple Spark version support code structure| +|[#68](https://github.com/oap-project/oap-mllib/pull/68)|[ML-55] [CPU] Add Naive Bayes| +|[#64](https://github.com/oap-project/oap-mllib/pull/64)|[ML-42] [PIP] Misc improvements and refactor code| +|[#62](https://github.com/oap-project/oap-mllib/pull/62)|[ML-30][Coding Style] Add code style rules & scripts for Scala, Java and C++| + +### SQL DS Cache + +#### Features +||| +|:---|:---| +|[#155](https://github.com/oap-project/sql-ds-cache/issues/155)|reorg to support profile based multi spark version| + +#### Bugs Fixed +||| +|:---|:---| +|[#190](https://github.com/oap-project/sql-ds-cache/issues/190)|The function of vmem-cache and guava-cache should not be associated with arrow.| +|[#181](https://github.com/oap-project/sql-ds-cache/issues/181)|[SDLe]Vulnerabilities scanned by Snyk| + +#### PRs +||| +|:---|:---| +|[#182](https://github.com/oap-project/sql-ds-cache/pull/182)|[SQL-DS-CACHE-181][SDLe]Fix Snyk code scan issues| +|[#191](https://github.com/oap-project/sql-ds-cache/pull/191)|[SQL-DS-CACHE-190]put plasma detector in seperate object to avoid unnecessary dependency of arrow| +|[#189](https://github.com/oap-project/sql-ds-cache/pull/189)|[SQL-DS-CACHE-188][POAE7-1253] improvement of fallback from plasma cache to simple cache| +|[#157](https://github.com/oap-project/sql-ds-cache/pull/157)|[SQL-DS-CACHE-155][POAE7-1187]reorg to support profile based multi spark version| + +### PMem Shuffle + +#### Bugs Fixed +||| +|:---|:---| +|[#46](https://github.com/oap-project/pmem-shuffle/issues/46)|Cannot run Terasort with pmem-shuffle of branch-1.2| +|[#43](https://github.com/oap-project/pmem-shuffle/issues/43)|Rpmp cannot be compiled due to the lack of boost header file.| + +#### PRs +||| +|:---|:---| +|[#51](https://github.com/oap-project/pmem-shuffle/pull/51)|[PMEM-SHUFFLE-50] Remove description about download submodules manually since they can be downloaded automatically.| +|[#49](https://github.com/oap-project/pmem-shuffle/pull/49)|[PMEM-SHUFFLE-48] Fix the bug about mapstatus tracking and add more connections for metastore.| +|[#47](https://github.com/oap-project/pmem-shuffle/pull/47)|[PMEM-SHUFFLE-46] Fix the bug that off-heap memory is over used in shuffle reduce stage. | +|[#40](https://github.com/oap-project/pmem-shuffle/pull/40)|[PMEM-SHUFFLE-39] Fix the bug that pmem-shuffle without RPMP fails to pass Terasort benchmark due to latest patch.| +|[#38](https://github.com/oap-project/pmem-shuffle/pull/38)|[PMEM-SHUFFLE-37] Add start-rpmp.sh and stop-rpmp.sh| +|[#33](https://github.com/oap-project/pmem-shuffle/pull/33)|[PMEM-SHUFFLE-28]Add RPMP with HA support and integrate it with Spark3.1.1| +|[#27](https://github.com/oap-project/pmem-shuffle/pull/27)|[PMEM-SHUFFLE] Change artifact name to make it compatible with naming…| + +### Remote Shuffle + +#### Bugs Fixed +||| +|:---|:---| +|[#24](https://github.com/oap-project/remote-shuffle/issues/24)|Enhance executor memory release| + +#### PRs +||| +|:---|:---| +|[#25](https://github.com/oap-project/remote-shuffle/pull/25)|[REMOTE-SHUFFLE-24] Enhance executor memory release| + ## Release 1.1.1 @@ -88,7 +280,7 @@ Generated on 2021-06-02 |[#39](https://github.com/oap-project/oap-mllib/pull/39)|[ML-26] Build for different spark version by -Pprofile| -### PMEM Spill +### PMem Spill #### Features ||| @@ -101,7 +293,7 @@ Generated on 2021-06-02 |[#41](https://github.com/oap-project/pmem-spill/pull/41)|[PMEM-SPILL-34][POAE7-1119]Port RDD cache to Spark 3.1.1 as separate module| -### PMEM Common +### PMem Common #### Features ||| @@ -116,7 +308,7 @@ Generated on 2021-06-02 |[#9](https://github.com/oap-project/pmem-common/pull/9)|[PMEM-COMMON-8][POAE7-896]use clflush optimize version for clflush| -### PMEM Shuffle +### PMem Shuffle #### Features ||| @@ -363,7 +555,7 @@ Generated on 2021-06-02 |[#19](https://github.com/oap-project/oap-mllib/pull/19)|[ML-18] Auto detect KVS port for oneCCL to avoid port conflict| -### PMEM Spill +### PMem Spill #### Bugs Fixed ||| @@ -383,7 +575,7 @@ Generated on 2021-06-02 |[#10](https://github.com/oap-project/pmem-spill/pull/10)|Fixing one pmem path on AppDirect mode may cause the pmem initialization path to be empty Path| -### PMEM Shuffle +### PMem Shuffle #### Features ||| diff --git a/README.md b/README.md index 635ece2f6..8916422cd 100644 --- a/README.md +++ b/README.md @@ -84,7 +84,7 @@ Please check the document [Installation Guide](./docs/Installation.md) ## Get started -To enable OAP NativeSQL Engine, the previous built jar `spark-columnar-core--jar-with-dependencies.jar` should be added to Spark configuration. We also recommend to use `spark-arrow-datasource-standard--jar-with-dependencies.jar`. We will demonstrate an example by using both jar files. +To enable Gazelle Plugin, the previous built jar `spark-columnar-core--jar-with-dependencies.jar` should be added to Spark configuration. We also recommend to use `spark-arrow-datasource-standard--jar-with-dependencies.jar`. We will demonstrate an example by using both jar files. SPARK related options are: * `spark.driver.extraClassPath` : Set to load jar file to driver. @@ -132,7 +132,7 @@ The result should showup on Spark console and you can check the DAG diagram with ### Google Cloud Dataproc -Gazelle Plugin now supports to run on Dataproc 2.0, we provide a guide to help quickly install Gazelle Plugin and run TPC-DS, TPC-H with scripts provided. +Gazelle Plugin now supports to run on Dataproc 2.0, we provide a guide to help quickly install Gazelle Plugin and run TPC-DS with notebooks or scripts. Please refer to [Gazelle_on_Dataproc](https://github.com/oap-project/oap-tools/blob/master/docs/Gazelle_on_Dataproc.md) to find details about: @@ -141,7 +141,7 @@ Gazelle Plugin jars compiled with `-Pdataproc-2.0` parameter will installed by C 2. Config for enabling Gazelle Plugin. -3. Run TPC-DS and TPC-H with scripts. +3. Run TPC-DS with notebooks or scripts. ## Performance data diff --git a/docs/OAP-Developer-Guide.md b/docs/OAP-Developer-Guide.md index 9b0e64c5e..30d9ed97a 100644 --- a/docs/OAP-Developer-Guide.md +++ b/docs/OAP-Developer-Guide.md @@ -3,13 +3,13 @@ This document contains the instructions & scripts on installing necessary dependencies and building OAP modules. You can get more detailed information from OAP each module below. -* [SQL Index and Data Source Cache](https://github.com/oap-project/sql-ds-cache/blob/v1.1.1-spark-3.1.1/docs/Developer-Guide.md) -* [PMem Common](https://github.com/oap-project/pmem-common/tree/v1.1.1-spark-3.1.1) -* [PMem Spill](https://github.com/oap-project/pmem-spill/tree/v1.1.1-spark-3.1.1) -* [PMem Shuffle](https://github.com/oap-project/pmem-shuffle/tree/v1.1.1-spark-3.1.1#5-install-dependencies-for-pmem-shuffle) -* [Remote Shuffle](https://github.com/oap-project/remote-shuffle/tree/v1.1.1-spark-3.1.1) -* [OAP MLlib](https://github.com/oap-project/oap-mllib/tree/v1.1.1-spark-3.1.1) -* [Gazelle Plugin](https://github.com/oap-project/gazelle_plugin/tree/v1.1.1-spark-3.1.1) +* [SQL Index and Data Source Cache](https://github.com/oap-project/sql-ds-cache/blob/v1.2.0/docs/Developer-Guide.md) +* [PMem Common](https://github.com/oap-project/pmem-common/tree/v1.2.0) +* [PMem Spill](https://github.com/oap-project/pmem-spill/tree/v1.2.0) +* [PMem Shuffle](https://github.com/oap-project/pmem-shuffle/tree/v1.2.0#5-install-dependencies-for-pmem-shuffle) +* [Remote Shuffle](https://github.com/oap-project/remote-shuffle/tree/v1.2.0) +* [OAP MLlib](https://github.com/oap-project/oap-mllib/tree/v1.2.0) +* [Gazelle Plugin](https://github.com/oap-project/gazelle_plugin/tree/v1.2.0) ## Building OAP @@ -22,7 +22,7 @@ We provide scripts to help automatically install dependencies required, please c # cd oap-tools # sh dev/install-compile-time-dependencies.sh ``` -*Note*: oap-tools tag version `v1.1.1-spark-3.1.1` corresponds to all OAP modules' tag version `v1.1.1-spark-3.1.1`. +*Note*: oap-tools tag version `v1.2.0` corresponds to all OAP modules' tag version `v1.2.0`. Then the dependencies below will be installed: @@ -33,34 +33,31 @@ Then the dependencies below will be installed: * [HPNL](https://github.com/Intel-bigdata/HPNL) * [PMDK](https://github.com/pmem/pmdk) * [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html) -* [Arrow](https://github.com/oap-project/arrow/tree/arrow-4.0.0-oap) +* [Arrow](https://github.com/oap-project/arrow/tree/v4.0.0-oap-1.2.0) * [LLVM](https://llvm.org/) -Run the following command to learn more. - -``` -# sh dev/scripts/prepare_oap_env.sh --help -``` - -Run the following command to automatically install specific dependency such as Maven. - -``` -# sh dev/scripts/prepare_oap_env.sh --prepare_maven -``` - - **Requirements for Shuffle Remote PMem Extension** If enable Shuffle Remote PMem extension with RDMA, you can refer to [PMem Shuffle](https://github.com/oap-project/pmem-shuffle) to configure and validate RDMA in advance. ### Building +#### Building OAP + OAP is built with [Apache Maven](http://maven.apache.org/) and Oracle Java 8. -To build OAP package, run command below then you can find a tarball named `oap-$VERSION-bin-spark-$VERSION.tar.gz` under directory `$OAP_TOOLS_HOME/dev/release-package `. +To build OAP package, run command below then you can find a tarball named `oap-$VERSION-*.tar.gz` under directory `$OAP_TOOLS_HOME/dev/release-package `, which contains all OAP module jars. +Change to `root` user, run + ``` -$ sh $OAP_TOOLS_HOME/dev/compile-oap.sh +# cd oap-tools +# sh dev/compile-oap.sh ``` -Building specified OAP Module, such as `sql-ds-cache`, run: +#### Building OAP specific module + +If you just want to build a specific OAP Module, such as `sql-ds-cache`, change to `root` user, then run: + ``` -$ sh $OAP_TOOLS_HOME/dev/compile-oap.sh --sql-ds-cache +# cd oap-tools +# sh dev/compile-oap.sh --component=sql-ds-cache ``` diff --git a/docs/OAP-Installation-Guide.md b/docs/OAP-Installation-Guide.md index a3d2a8cdb..a4d6b16f3 100644 --- a/docs/OAP-Installation-Guide.md +++ b/docs/OAP-Installation-Guide.md @@ -26,17 +26,16 @@ To test your installation, run the command `conda list` in your terminal window ### Installing OAP Create a Conda environment and install OAP Conda package. + ```bash -$ conda create -n oapenv -y python=3.7 -$ conda activate oapenv -$ conda install -c conda-forge -c intel -y oap=1.1.1 +$ conda create -n oapenv -c conda-forge -c intel -y oap=1.2.0 ``` Once finished steps above, you have completed OAP dependencies installation and OAP building, and will find built OAP jars under `$HOME/miniconda2/envs/oapenv/oap_jars` Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps. -- [Arrow](https://github.com/oap-project/arrow/tree/arrow-4.0.0-oap) +- [Arrow](https://github.com/oap-project/arrow/tree/v4.0.0-oap-1.2.0) - [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/) - [Memkind](https://github.com/memkind/memkind/tree/v1.10.1) - [Vmemcache](https://github.com/pmem/vmemcache.git) diff --git a/docs/index.md b/docs/index.md index fc5788fa0..43972520a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -76,7 +76,7 @@ Please check the document [Installation Guide](./Installation.md) ## Get started -To enable OAP NativeSQL Engine, the previous built jar `spark-columnar-core--jar-with-dependencies.jar` should be added to Spark configuration. We also recommend to use `spark-arrow-datasource-standard--jar-with-dependencies.jar`. We will demonstrate an example by using both jar files. +To enable Gazelle Plugin, the previous built jar `spark-columnar-core--jar-with-dependencies.jar` should be added to Spark configuration. We also recommend to use `spark-arrow-datasource-standard--jar-with-dependencies.jar`. We will demonstrate an example by using both jar files. SPARK related options are: * `spark.driver.extraClassPath` : Set to load jar file to driver. @@ -120,6 +120,21 @@ spark.sql("select * from orders where o_orderdate > date '1998-07-26'").show(200 The result should showup on Spark console and you can check the DAG diagram with some Columnar Processing stage. Gazelle Plugin still lacks some features, please check out the [limitations](./limitations.md). +## Could/K8s Integration + +### Google Cloud Dataproc + +Gazelle Plugin now supports to run on Dataproc 2.0, we provide a guide to help quickly install Gazelle Plugin and run TPC-DS with notebooks or scripts. + +Please refer to [Gazelle_on_Dataproc](https://github.com/oap-project/oap-tools/blob/master/docs/Gazelle_on_Dataproc.md) to find details about: + +1. Create a cluster on Dataproc 2.0 with initialization actions. +Gazelle Plugin jars compiled with `-Pdataproc-2.0` parameter will installed by Conda in all cluster nodes. + +2. Config for enabling Gazelle Plugin. + +3. Run TPC-DS with notebooks or scripts. + ## Performance data @@ -142,6 +157,11 @@ We pick up 10 queries which can be fully supported in OAP v1.0-Gazelle Plugin an Please notes the performance data is not an official from TPC-H and TPC-DS. The actual performance result may vary by individual workloads. Please try your workloads with Gazelle Plugin first and check the DAG or log file to see if all the operators can be supported in OAP-Gazelle Plugin. +## Memory allocation +The memory usage in Gazelle Plugin is high. The allocations goes to two parts: 1) Java based allocation which is widely used in Arrow Java API. 2) Native side memory allocation used in each native kernel. We investigated the memory allocation behavior and made more turnings [here](./memory.md), the memroy footprint is stable during a TPC-DS power run. +![Memory](./image/rssmem.png) + + ## Coding Style