Skip to content

Commit

Permalink
Make HMS thread pool size configurable (#75)
Browse files Browse the repository at this point in the history
* Make HMS thread pool size configurable

* update README with new vars and alphabetize

Co-authored-by: Scott Barnhart <[email protected]>
  • Loading branch information
barnharts4 and Scott Barnhart authored May 11, 2020
1 parent d91d04f commit 52f0347
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 11 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [1.14.0] - 2020-05-11
### Added
- Enable caller to set min and max size of the Hive metastore thread pool. If not set, defaults to 200/1000 (Hive defaults).

## [1.13.0] - 2020-04-21
### Added
- If S3 access logs are enabled in `apiary-data-lake`, create Hive database `s3_logs_hive` on startup. Raw logs go to bucket `<prefix>-s3-logs` and Hive Parquet data to bucket `<prefix>-s3-logs-hive`. This is pre-work to prepare for S3 access-log Hive tables in a future version of Apiary.
Expand Down
24 changes: 13 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,33 @@ For more information please refer to the main [Apiary](https://github.com/Expedi
|----|----|----|
|APIARY_S3_INVENTORY_PREFIX|No (defaults to `EntireBucketDaily`)|Prefix used by S3 Inventory when creating data in the inventory bucket.|
|APIARY_S3_INVENTORY_TABLE_FORMAT|No (defaults to `ORC`)|Format of S3 inventory data - `ORC`, `Parquet`, or `CSV`|
|AWS_REGION|Yes|AWS region to configure various AWS clients.|
|ATLAS_KAFKA_BOOTSTRAP_SERVERS|No|Atlas hive-bridge kafka bootstrap servers.|
|ENABLE_METRICS|No|Option to enable sending Hive Metastore metrics to CloudWatch.|
|AWS_REGION|Yes|AWS region to configure various AWS clients.|
|ENABLE_GLUESYNC|No|Option to turn on GlueSync Hive Metastore listener.|
|ENABLE_METRICS|No|Option to enable sending Hive Metastore metrics to CloudWatch.|
|ENABLE_S3_INVENTORY|No|Option to create Hive tables on top of S3 inventory data if enabled in `apiary-data-lake`. Enabled if value is not null/empty.|
|ENABLE_S3_LOGS|No|Option to create Hive tables on top of S3 access logs data if enabled in `apiary-data-lake`. Enabled if value is not null/empty.|
|EXTERNAL_DATABASE|No|Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema.|
|GLUE_PREFIX|No|Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog.|
|HADOOP_HEAPSIZE|No|Hive Metastore Java process heapsize.|
|RANGER_POLICY_MANAGER_URL|No|Ranger admin URL from where policies will be downloaded.|
|RANGER_SERVICE_NAME|No|Ranger service name used to configure RangerAuth plugin.|
|RANGER_AUDIT_DB_URL|No|Ranger audit database JDBC URL.|
|RANGER_AUDIT_SECRET_ARN|No|Ranger audit database secret ARN.|
|RANGER_AUDIT_SOLR_URL|No|Ranger Solr audit URL.|
|LDAP_URL|No|Active Directory URL to enable group mapping in metastore.|
|LDAP_BASE|No|LDAP base DN used to search for user groups.|
|LDAP_SECRET_ARN|No|LDAP bind DN SecretsManager secret ARN.|
|LDAP_CA_CERT|Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection.|
|HIVE_METASTORE_ACCESS_MODE|No|Hive Metastore access mode, applicable values are: readwrite, readonly|
|HIVE_DB_NAMES|No|comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets.|
|HIVE_METASTORE_LOG_LEVEL|No|Hive Metastore service Log4j log level.|
|HMS_MIN_THREADS|No (defaults to `200`)|Minimum size of the Hive metastore thread pool.|
|HMS_MAX_THREADS|No (defaults to `1000`)|Maximum size of the Hive metastore thread pool.|
|INSTANCE_NAME|Yes|Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments.|
|LDAP_BASE|No|LDAP base DN used to search for user groups.|
|LDAP_CA_CERT|Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection.|
|LDAP_SECRET_ARN|No|LDAP bind DN SecretsManager secret ARN.|
|LDAP_URL|No|Active Directory URL to enable group mapping in metastore.|
|MYSQL_DB_HOST|Yes|Hive Metastore MySQL database hostname.|
|MYSQL_DB_NAME|Yes|Hive Metastore MySQL database name.|
|MYSQL_SECRET_ARN|Yes|Hive Metastore MySQL SecretsManager secret ARN.|
|RANGER_AUDIT_DB_URL|No|Ranger audit database JDBC URL.|
|RANGER_AUDIT_SECRET_ARN|No|Ranger audit database secret ARN.|
|RANGER_AUDIT_SOLR_URL|No|Ranger Solr audit URL.|
|RANGER_POLICY_MANAGER_URL|No|Ranger admin URL from where policies will be downloaded.|
|RANGER_SERVICE_NAME|No|Ranger service name used to configure RangerAuth plugin.|
|SNS_ARN|No|The SNS topic ARN to which metadata updates will be sent.|
|TABLE_PARAM_FILTER|No|A regular expression for selecting necessary table parameters. If the value isn't set, then no table parameters are selected.|

Expand Down
10 changes: 10 additions & 0 deletions files/hive-site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,16 @@
<value>false</value>
</property>

<property>
<name>hive.metastore.server.min.threads</name>
<value>200</value>
</property>

<property>
<name>hive.metastore.server.max.threads</name>
<value>1000</value>
</property>

<property>
<name>hive.service.metrics.class</name>
<value>org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics</value>
Expand Down
9 changes: 9 additions & 0 deletions files/startup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ set -x
[[ -z "$MYSQL_DB_USERNAME" ]] && export MYSQL_DB_USERNAME=$(aws secretsmanager get-secret-value --secret-id ${MYSQL_SECRET_ARN}|jq .SecretString -r|jq .username -r)
[[ -z "$MYSQL_DB_PASSWORD" ]] && export MYSQL_DB_PASSWORD=$(aws secretsmanager get-secret-value --secret-id ${MYSQL_SECRET_ARN}|jq .SecretString -r|jq .password -r)


#config Hive min/max thread pool size. Terraform will set the env var based on size of memory
if [[ -n ${HMS_MIN_THREADS} ]]; then
update_property.py hive.metastore.server.min.threads "${HMS_MIN_THREADS}" /etc/hive/conf/hive-site.xml
fi
if [[ -n ${HMS_MAX_THREADS} ]]; then
update_property.py hive.metastore.server.max.threads "${HMS_MAX_THREADS}" /etc/hive/conf/hive-site.xml
fi

#configure LDAP group mapping, required for ranger authorization
if [[ -n $LDAP_URL ]] ; then
update_property.py hadoop.security.group.mapping.ldap.bind.user "$(aws secretsmanager get-secret-value --secret-id ${LDAP_SECRET_ARN}|jq .SecretString -r|jq .username -r)" /etc/hadoop/conf/core-site.xml
Expand Down

0 comments on commit 52f0347

Please sign in to comment.