Skip to content

Commit

Permalink
Update Readme.md file
Browse files Browse the repository at this point in the history
  • Loading branch information
Dhrubajyoti Sadhu committed Feb 6, 2024
1 parent de0f21e commit 3a9baf9
Showing 1 changed file with 43 additions and 43 deletions.
86 changes: 43 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,50 @@
For more information please refer to the main [Apiary](https://github.com/ExpediaGroup/apiary) project page.

## Environment Variables
|Environment Variable|Required|Description|
|----|----|----|
|APIARY_S3_INVENTORY_PREFIX|No (defaults to `EntireBucketDaily`)|Prefix used by S3 Inventory when creating data in the inventory bucket.|
|APIARY_S3_INVENTORY_TABLE_FORMAT|No (defaults to `ORC`)|Format of S3 inventory data - `ORC`, `Parquet`, or `CSV`|
|APIARY_SYSTEM_SCHEMA|No (defaults to `apiary_system`)|Name for internal system database.|
|AWS_REGION|Yes|AWS region to configure various AWS clients.|
|AWS_WEB_IDENTITY_TOKEN_FILE|No|Path of the AWS Web Identity Token File for IRSA/OIDC AWS authentication.|
|DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES|No|`true`/`false` value for hive.metastore.disallow.incompatible.col.type.changes, default `true`.|
|ENABLE_GLUESYNC|No|Option to turn on GlueSync Hive Metastore listener.|
|ENABLE_HIVE_LOCK_HOUSE_KEEPER|No|Option to turn on Hive Metastore Hive Lock House Keeper.|
|ENABLE_METRICS|No|Option to enable sending Hive Metastore metrics to CloudWatch.|
|ENABLE_S3_INVENTORY|No|Option to create Hive tables on top of S3 inventory data if enabled in `apiary-data-lake`. Enabled if value is not null/empty.|
|ENABLE_S3_LOGS|No|Option to create Hive tables on top of S3 access logs data if enabled in `apiary-data-lake`. Enabled if value is not null/empty.|
|EXTERNAL_DATABASE|No|Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema.|
|GLUE_PREFIX|No|Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog.|
|HADOOP_HEAPSIZE|No|Hive Metastore Java process heapsize.|
|HMS_AUTOGATHER_STATS|No (default is `true`)|Whether or not to create basic statistics on table/partition creation. Valid values are `true` or `false`.|
|LIMIT_PARTITION_REQUEST_NUMBER|No (default is `-1`)|To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value "-1" means no limit. The limit on partitions does not affect metadata-only queries.|
|HIVE_METASTORE_ACCESS_MODE|No|Hive Metastore access mode, applicable values are: readwrite, readonly|
|HIVE_DB_NAMES|No|comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets.|
|HIVE_METASTORE_LOG_LEVEL|No|Hive Metastore service Log4j log level.|
|HMS_MIN_THREADS|No (defaults to `200`)|Minimum size of the Hive metastore thread pool.|
|HMS_MAX_THREADS|No (defaults to `1000`)|Maximum size of the Hive metastore thread pool.|
|INSTANCE_NAME|Yes|Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments.|
|KAFKA_BOOTSTRAP_SERVERS|No|Kafka Bootstrap Servers to enable Kafka Metastore listener and send Metastore events to Kafka.|
|KAFKA_CLIENT_ID|No|Kafka label you define that names the Kafka producer.|
|KAFKA_COMPRESSION_TYPE|No (defaults to `1048576`)|The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size.|
|KAFKA_MAX_REQUEST_SIZE|No|Kafka Compression type, if none is specified there is no compression enabled. Values available are gzip, lz4 and snappy.|
|LDAP_BASE|No|LDAP base DN used to search for user groups.|
|Environment Variable|Required| Description |
|----|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|APIARY_S3_INVENTORY_PREFIX|No (defaults to `EntireBucketDaily`)| Prefix used by S3 Inventory when creating data in the inventory bucket. |
|APIARY_S3_INVENTORY_TABLE_FORMAT|No (defaults to `ORC`)| Format of S3 inventory data - `ORC`, `Parquet`, or `CSV` |
|APIARY_SYSTEM_SCHEMA|No (defaults to `apiary_system`)| Name for internal system database. |
|AWS_REGION|Yes| AWS region to configure various AWS clients. |
|AWS_WEB_IDENTITY_TOKEN_FILE|No| Path of the AWS Web Identity Token File for IRSA/OIDC AWS authentication. |
|DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES|No| `true`/`false` value for hive.metastore.disallow.incompatible.col.type.changes, default `true`. |
|ENABLE_GLUESYNC|No| Option to turn on GlueSync Hive Metastore listener. |
|ENABLE_HIVE_LOCK_HOUSE_KEEPER|No| Option to turn on Hive Metastore Hive Lock House Keeper. |
|ENABLE_METRICS|No| Option to enable sending Hive Metastore and JMX metrics to Prometheus. |
|ENABLE_S3_INVENTORY|No| Option to create Hive tables on top of S3 inventory data if enabled in `apiary-data-lake`. Enabled if value is not null/empty. |
|ENABLE_S3_LOGS|No| Option to create Hive tables on top of S3 access logs data if enabled in `apiary-data-lake`. Enabled if value is not null/empty. |
|EXTERNAL_DATABASE|No| Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema. |
|GLUE_PREFIX|No| Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog. |
|HADOOP_HEAPSIZE|No| Hive Metastore Java process heapsize. |
|HMS_AUTOGATHER_STATS|No (default is `true`)| Whether or not to create basic statistics on table/partition creation. Valid values are `true` or `false`. |
|LIMIT_PARTITION_REQUEST_NUMBER|No (default is `-1`)| To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value "-1" means no limit. The limit on partitions does not affect metadata-only queries. |
|HIVE_METASTORE_ACCESS_MODE|No| Hive Metastore access mode, applicable values are: readwrite, readonly |
|HIVE_DB_NAMES|No| comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets. |
|HIVE_METASTORE_LOG_LEVEL|No| Hive Metastore service Log4j log level. |
|HMS_MIN_THREADS|No (defaults to `200`)| Minimum size of the Hive metastore thread pool. |
|HMS_MAX_THREADS|No (defaults to `1000`)| Maximum size of the Hive metastore thread pool. |
|INSTANCE_NAME|Yes| Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments. |
|KAFKA_BOOTSTRAP_SERVERS|No| Kafka Bootstrap Servers to enable Kafka Metastore listener and send Metastore events to Kafka. |
|KAFKA_CLIENT_ID|No| Kafka label you define that names the Kafka producer. |
|KAFKA_COMPRESSION_TYPE|No (defaults to `1048576`)| The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size. |
|KAFKA_MAX_REQUEST_SIZE|No| Kafka Compression type, if none is specified there is no compression enabled. Values available are gzip, lz4 and snappy. |
|LDAP_BASE|No| LDAP base DN used to search for user groups. |
|LDAP_CA_CERT|Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection.|
|LDAP_SECRET_ARN|No|LDAP bind DN SecretsManager secret ARN.|
|LDAP_URL|No|Active Directory URL to enable group mapping in metastore.|
|MYSQL_CONNECTION_POOL_SIZE|No (defaults to `10`)|MySQL Connection pool size for Hive Metastore. See [here](https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1181) for more info.|
|MYSQL_DB_HOST|Yes|Hive Metastore MySQL database hostname.|
|MYSQL_DB_NAME|Yes|Hive Metastore MySQL database name.|
|MYSQL_SECRET_ARN|Yes|Hive Metastore MySQL SecretsManager secret ARN.|
|MYSQL_SECRET_USERNAME_KEY|No (defaults to `username`)|Hive Metastore MySQL SecretsManager secret username key.|
|RANGER_AUDIT_DB_URL|No|Ranger audit database JDBC URL.|
|RANGER_AUDIT_SECRET_ARN|No|Ranger audit database secret ARN.|
|RANGER_AUDIT_SOLR_URL|No|Ranger Solr audit URL.|
|RANGER_POLICY_MANAGER_URL|No|Ranger admin URL from where policies will be downloaded.|
|RANGER_SERVICE_NAME|No|Ranger service name used to configure RangerAuth plugin.|
|SNS_ARN|No|The SNS topic ARN to which metadata updates will be sent.|
|TABLE_PARAM_FILTER|No|A regular expression for selecting necessary table parameters. If the value isn't set, then no table parameters are selected.|
|LDAP_SECRET_ARN|No| LDAP bind DN SecretsManager secret ARN. |
|LDAP_URL|No| Active Directory URL to enable group mapping in metastore. |
|MYSQL_CONNECTION_POOL_SIZE|No (defaults to `10`)| MySQL Connection pool size for Hive Metastore. See [here](https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1181) for more info. |
|MYSQL_DB_HOST|Yes| Hive Metastore MySQL database hostname. |
|MYSQL_DB_NAME|Yes| Hive Metastore MySQL database name. |
|MYSQL_SECRET_ARN|Yes| Hive Metastore MySQL SecretsManager secret ARN. |
|MYSQL_SECRET_USERNAME_KEY|No (defaults to `username`)| Hive Metastore MySQL SecretsManager secret username key. |
|RANGER_AUDIT_DB_URL|No| Ranger audit database JDBC URL. |
|RANGER_AUDIT_SECRET_ARN|No| Ranger audit database secret ARN. |
|RANGER_AUDIT_SOLR_URL|No| Ranger Solr audit URL. |
|RANGER_POLICY_MANAGER_URL|No| Ranger admin URL from where policies will be downloaded. |
|RANGER_SERVICE_NAME|No| Ranger service name used to configure RangerAuth plugin. |
|SNS_ARN|No| The SNS topic ARN to which metadata updates will be sent. |
|TABLE_PARAM_FILTER|No| A regular expression for selecting necessary table parameters. If the value isn't set, then no table parameters are selected. |

# Contact

Expand Down

0 comments on commit 3a9baf9

Please sign in to comment.