Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8223][DOCS] Docs update for the behavior change of config loading #12074

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions website/docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ hoodie.datasource.hive_sync.support_timestamp false
```
It helps to have a central configuration file for your common cross job configurations/tunings, so all the jobs on your cluster can utilize it. It also works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the SQL statements.

By default, Hudi would load the configuration file under `/etc/hudi/conf` directory. You can specify a different configuration directory location by setting the `HUDI_CONF_DIR` environment variable.
Hudi always loads the configuration file under default directory `file:/etc/hudi/conf`, if exists, to set the default configs. You can specify a different configuration directory location by setting the `HUDI_CONF_DIR` environment variable.
- [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read.
- [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read.
- [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to actually perform writes to storage. These configs provide deep control over lower level aspects like file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads.
Expand All @@ -38,9 +38,10 @@ In the tables below **(N/A)** means there is no default value set

## Externalized Config File
Instead of directly passing configuration settings to every Hudi job, you can also centrally set them in a configuration
file `hudi-defaults.conf`. By default, Hudi would load the configuration file under `/etc/hudi/conf` directory. You can
specify a different configuration directory location by setting the `HUDI_CONF_DIR` environment variable. This can be
useful for uniformly enforcing repeated configs (like Hive sync or write/index tuning), across your entire data lake.
file `hudi-defaults.conf`. Hudi always loads the configuration file under default directory `file:/etc/hudi/conf`, if exists,
to set the default configs. Besides, you can specify another configuration directory location by setting the `HUDI_CONF_DIR`
environment variable. The configs stored in `HUDI_CONF_DIR/hudi-defaults.conf` are loaded, overriding any configs already set
by the config file in the default directory.

## Hudi Table Config {#TABLE_CONFIG}
Basic Hudi Table configuration parameters.
Expand Down
Loading