Es.write.operation documentation is deceptive on default values when used via spark #2206

robwithhair · 2024-03-19T10:01:01Z

What kind an issue is this?

Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Documentation suggests default es.write.operation is index but when used via spark output mode "update" the default mode is actually upsert. This information is only available by reading code.

Documentation is deceptive because it suggests that in spark update mode the default value of index will be used when actually the default is overridden to be "upsert" it appears in testing and by visually reviewing code.

Steps to reproduce

Code:

N/A as is documentation fix

Strack trace:

N/A

jbaiera · 2024-03-19T21:00:15Z

This could be better detailed in the docs for sure.

When using update mode in Spark SQL, the connector changes the operation to be "upsert" since 1) it needs to use that request mode to satisfy the invariants defined by Spark and 2) it's anticipating your need for that setting to be set to use that mode and so it just sets it for you so you don't have to say you want to update data in multiple places.

Fun fact: There are actually quite a lot of things in Spark that we plug into in order to modify the connector's behavior based on your API usage, like pushing down queries to ES (by default we don't filter results from the server, but we generate queries based on the query plan if we're able to) or limiting returned fields from the server (we'll intercept the field projection from Spark if it's available so we don't pull a bunch of fields from each document that aren't needed for the operation). It's tough to list these all out because in some cases we are merging existing configurations together, in other cases we override them, and sometimes we're just offloading some of the concern on to the library code so users don't have to worry about configurations.

jbaiera added the >docs label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Es.write.operation documentation is deceptive on default values when used via spark #2206

Es.write.operation documentation is deceptive on default values when used via spark #2206

robwithhair commented Mar 19, 2024 •

edited

Loading

jbaiera commented Mar 19, 2024

Es.write.operation documentation is deceptive on default values when used via spark #2206

Es.write.operation documentation is deceptive on default values when used via spark #2206

Comments

robwithhair commented Mar 19, 2024 • edited Loading

What kind an issue is this?

Issue description

Steps to reproduce

jbaiera commented Mar 19, 2024

robwithhair commented Mar 19, 2024 •

edited

Loading