Pre-work for S3 ranger logs hive table feature (#77)

* creating hive table for Ranger logs * removing ranger log repair functionality * remove references * fixing db name * changelog and readme * correct date * setting release date to TBD * creating system db * updating changelog * updating readme and changelog * small change to changelog * removing test file that was added by mistake, change in changelog.md * fixing problem in startup.sh * adding line removed by mistake * passing system db name as var * PR suggestion * update changelog * modifying readme * Update CHANGELOG.md Co-authored-by: Scott Barnhart <[email protected]> * Update README.md Co-authored-by: Scott Barnhart <[email protected]> * current date Co-authored-by: Scott Barnhart <[email protected]>
ExpediaGroup · Jun 16, 2020 · 6c99a57 · 6c99a57
1 parent 52f0347
commit 6c99a57
Show file tree

Hide file tree

Showing 3 changed files with 8 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,11 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
+## [1.15.0] - 2020-06-16
+### Added
+Create Hive database `apiary_system` on startup. Data for Ranger access logs goes to bucket `<prefix>-apiary-system` in Parquet format.
+This is pre-work to prepare for Ranger access-log Hive tables in a future version of Apiary.
+
 ## [1.14.0] - 2020-05-11
 ### Added
 - Enable caller to set min and max size of the Hive metastore thread pool.  If not set, defaults to 200/1000 (Hive defaults).

diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ For more information please refer to the main [Apiary](https://github.com/Expedi
 |----|----|----|
 |APIARY_S3_INVENTORY_PREFIX|No (defaults to `EntireBucketDaily`)|Prefix used by S3 Inventory when creating data in the inventory bucket.|
 |APIARY_S3_INVENTORY_TABLE_FORMAT|No (defaults to `ORC`)|Format of S3 inventory data - `ORC`, `Parquet`, or `CSV`|
+|APIARY_SYSTEM_SCHEMA|No (defaults to `apiary_system`)|Name for internal system database.|
 |ATLAS_KAFKA_BOOTSTRAP_SERVERS|No|Atlas hive-bridge kafka bootstrap servers.|
 |AWS_REGION|Yes|AWS region to configure various AWS clients.|
 |ENABLE_GLUESYNC|No|Option to turn on GlueSync Hive Metastore listener.|

diff --git a/files/startup.sh b/files/startup.sh
@@ -118,6 +118,8 @@ if [ -z $EXTERNAL_DATABASE ] && [ "$HIVE_METASTORE_ACCESS_MODE" = "readwrite" ];
             HIVE_APIARY_DB_NAMES="${HIVE_APIARY_DB_NAMES},${APIARY_S3_LOGS_SCHEMA}"
         fi
 
+        HIVE_APIARY_DB_NAMES="${HIVE_APIARY_DB_NAMES},${APIARY_SYSTEM_SCHEMA:-apiary_system}"
+
         AWS_ACCOUNT=`aws sts get-caller-identity|jq -r .Account`
         for HIVE_DB in `echo ${HIVE_APIARY_DB_NAMES}|tr "," "\n"`
         do