From 6c99a578e6c401f20436fed49b4b33aa8a8acfac Mon Sep 17 00:00:00 2001 From: akravchuk1 Date: Tue, 16 Jun 2020 11:46:20 -0700 Subject: [PATCH] Pre-work for S3 ranger logs hive table feature (#77) * creating hive table for Ranger logs * removing ranger log repair functionality * remove references * fixing db name * changelog and readme * correct date * setting release date to TBD * creating system db * updating changelog * updating readme and changelog * small change to changelog * removing test file that was added by mistake, change in changelog.md * fixing problem in startup.sh * adding line removed by mistake * passing system db name as var * PR suggestion * update changelog * modifying readme * Update CHANGELOG.md Co-authored-by: Scott Barnhart * Update README.md Co-authored-by: Scott Barnhart * current date Co-authored-by: Scott Barnhart --- CHANGELOG.md | 5 +++++ README.md | 1 + files/startup.sh | 2 ++ 3 files changed, 8 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index e1fc022..4d19b14 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,11 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). +## [1.15.0] - 2020-06-16 +### Added +Create Hive database `apiary_system` on startup. Data for Ranger access logs goes to bucket `-apiary-system` in Parquet format. +This is pre-work to prepare for Ranger access-log Hive tables in a future version of Apiary. + ## [1.14.0] - 2020-05-11 ### Added - Enable caller to set min and max size of the Hive metastore thread pool. If not set, defaults to 200/1000 (Hive defaults). diff --git a/README.md b/README.md index 43b7416..22d544d 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ For more information please refer to the main [Apiary](https://github.com/Expedi |----|----|----| |APIARY_S3_INVENTORY_PREFIX|No (defaults to `EntireBucketDaily`)|Prefix used by S3 Inventory when creating data in the inventory bucket.| |APIARY_S3_INVENTORY_TABLE_FORMAT|No (defaults to `ORC`)|Format of S3 inventory data - `ORC`, `Parquet`, or `CSV`| +|APIARY_SYSTEM_SCHEMA|No (defaults to `apiary_system`)|Name for internal system database.| |ATLAS_KAFKA_BOOTSTRAP_SERVERS|No|Atlas hive-bridge kafka bootstrap servers.| |AWS_REGION|Yes|AWS region to configure various AWS clients.| |ENABLE_GLUESYNC|No|Option to turn on GlueSync Hive Metastore listener.| diff --git a/files/startup.sh b/files/startup.sh index c5a580e..fd9c215 100755 --- a/files/startup.sh +++ b/files/startup.sh @@ -118,6 +118,8 @@ if [ -z $EXTERNAL_DATABASE ] && [ "$HIVE_METASTORE_ACCESS_MODE" = "readwrite" ]; HIVE_APIARY_DB_NAMES="${HIVE_APIARY_DB_NAMES},${APIARY_S3_LOGS_SCHEMA}" fi + HIVE_APIARY_DB_NAMES="${HIVE_APIARY_DB_NAMES},${APIARY_SYSTEM_SCHEMA:-apiary_system}" + AWS_ACCOUNT=`aws sts get-caller-identity|jq -r .Account` for HIVE_DB in `echo ${HIVE_APIARY_DB_NAMES}|tr "," "\n"` do