Resolve pipeline on lazy rollover write #116031

parkertimmins · 2024-10-31T17:00:23Z

This fixes a bug described in #112781 . The issue being that if lazy rollover is set, and a reroute processor always reroutes to another index, the write index of a data stream will never roll over. Because of this, if the pipeline is changed in a template, this change will not go into effect. To avoid this, when lazy rollover is set, we always resolve the pipeline from templates.

Fixes: #112781

…rnative

server/src/main/java/org/elasticsearch/ingest/IngestService.java

parkertimmins · 2024-11-01T14:48:56Z

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

        for (DocWriteRequest<?> actionRequest : bulkRequest.requests) {
            IndexRequest indexRequest = getIndexWriteRequest(actionRequest);
            if (indexRequest != null) {
-                IngestService.resolvePipelinesAndUpdateIndexRequest(actionRequest, indexRequest, metadata);


I am been having trouble unit testing this bit of code that does caching because it's inlined in the loop. I'm not super worried, because it is very similar to resolvePipelinesAndUpdateIndexRequest() which is tested well (but doesn't do the caching).

We discussed previously (and discarded) some ideas for encapsulating this logic so it could be tested. Eg, having resolvePipelinesAndUpdateIndexRequest take a reference to resolvedPipelineCache, but that was decidedly nasty.

The only other idea I have, it to encapsulate the resolvedPipelineCache map in an class called CachedPipelineResolver with a function resolvePipelinesAndUpdateIndexRequest consisting of the lines of logic below.

Thoughts?

This is what I meant:

public static class CachedPipelineResolver { private final Map<String, Pipelines> resolvedPipelineCache; public CachedPipelineResolver() { this(new HashMap<>()); } // For testing CachedPipelineResolver(Map<String, Pipelines> resolvedPipelineCache) { this.resolvedPipelineCache = resolvedPipelineCache; } public void resolvePipelinesAndUpdateIndexRequest( final DocWriteRequest<?> originalRequest, final IndexRequest indexRequest, final Metadata metadata ) { if (indexRequest.isPipelineResolved() == false) { var pipeline = resolvedPipelineCache.computeIfAbsent( indexRequest.index(), // TODO perhaps this should use `threadPool.absoluteTimeInMillis()`, but leaving as is for now. (index) -> IngestService.resolveStoredPipelines(originalRequest, indexRequest, metadata, System.currentTimeMillis()) ); IngestService.setPipelineOnRequest(indexRequest, pipeline); } }

If y'all like this approach, I have an un-pushed commit with this change and more unit tests.

I don't have strong feelings either way.

I'm in favor of that version of the code (or something not entirely unlike it), and I think it resolves my issues about how intimate these two classes are getting with each other. Given the plan for backporting this PR, I think we should punt on that change until a subsequent PR.

That is, let's merge this PR more or less as-is to all the branches, but then we can follow up with a separate refactoring PR for main and 8.x only next week.

elasticsearchmachine · 2024-11-01T14:51:39Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2024-11-01T14:51:40Z

Hi @parkertimmins, I've created a changelog YAML for you.

server/src/main/java/org/elasticsearch/ingest/IngestService.java

joegallo · 2024-11-01T18:19:56Z

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

+                        (index) -> IngestService.resolveStoredPipelines(actionRequest, indexRequest, metadata, System.currentTimeMillis())
+                    );
+                    IngestService.setPipelineOnRequest(indexRequest, pipeline);
+                }
                hasIndexRequestsWithPipelines |= IngestService.hasPipeline(indexRequest);
            }


I think the above block yearns ~~for the mines~~ wants to be a method of IngestService...

That is, how many static helper methods of IngestService is TransportAbstractBulkAction allowed to invoke before we call shenanigans?

We're punting on this #116031 (comment)

joegallo · 2024-11-01T18:22:18Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

@@ -1507,7 +1540,7 @@ public static boolean hasPipeline(IndexRequest indexRequest) {
            || NOOP_PIPELINE_NAME.equals(indexRequest.getFinalPipeline()) == false;
    }

-    private record Pipelines(String defaultPipeline, String finalPipeline) {
+    public record Pipelines(String defaultPipeline, String finalPipeline) {


In the spirit of https://github.com/elastic/elasticsearch/pull/116031/files#r1826153821, I'd like this PR a lot more if this record didn't become public.

We're punting on this #116031 (comment)

…rnative

masseyke

LGTM

joegallo

LGTM. I still think we should tighten up the date math expression logic, but there's bigger issues there than just this one PR brings up. Given that template application isn't parsing json (so to speak) I don't think the performance will be bad enough to invalidate this approach (and your cache avoids doing it per-document-per-bulk, so the worst case isn't sooo bad).

I think this is good enough for a bugfix, but that we should clean up the separation of concerns in a subsequent refactoring PR.

…rnative

elasticsearchmachine · 2024-11-02T03:56:35Z

💔 Backport failed

Status	Branch	Result
✅	8.16
❌	8.15	Commit could not be cherrypicked due to conflicts
✅	8.x

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 116031

If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781

parkertimmins · 2024-11-02T14:22:07Z

💚 All backports created successfully

Status	Branch	Result
✅	8.15

Questions ?

Please refer to the Backport tool documentation

If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781 (cherry picked from commit 6db39d1) # Conflicts: # server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java # server/src/main/java/org/elasticsearch/ingest/IngestService.java

… (#116131) * Resolve pipelines from template if lazy rollover write (#116031) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 * Remute tests blocking merge * Remute tests blocking merge

#116132) * Resolve pipelines from template if lazy rollover write (#116031) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 * Remute tests block merge * Remute tests block merge

…6137) If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: #112781 (cherry picked from commit 6db39d1)

If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index. Fixes: elastic#112781

parkertimmins added 4 commits October 30, 2024 14:35

Initial work on resolving pipeline from template if lazy rollover

70ce886

spotless & only use index from IndexRequest

c79b34c

Merge branch 'main' into resolve-pipeline-on-lazy-rollover-write

8a5aa09

Alternative way to resolve pipeline once per index

a235807

elasticsearchmachine added the v9.0.0 label Oct 31, 2024

fix existing tests, spotless

a96920b

parkertimmins closed this Oct 31, 2024

Separate resolve and set pipelines and inline

eee66fe

parkertimmins reopened this Oct 31, 2024

parkertimmins added 2 commits October 31, 2024 16:40

Add unit test covering lazy rollover

1b4df40

Merge branch 'main' into resolve-pipeline-on-lazy-rollover-write-alte…

d60635f

…rnative

parkertimmins mentioned this pull request Oct 31, 2024

Resolve pipeline on lazy rollover write #115987

Closed

yaml rest test

c7fcdef

parkertimmins changed the title ~~Resolve pipeline on lazy rollover write alternative~~ Resolve pipeline on lazy rollover write Nov 1, 2024

Add unit test for rollover on write

375a1e5

mattc58 requested review from joegallo and masseyke November 1, 2024 14:11

Add another level of reroute to yaml test

02c8d91

parkertimmins commented Nov 1, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/ingest/IngestService.java Outdated Show resolved Hide resolved

parkertimmins commented Nov 1, 2024

View reviewed changes

parkertimmins marked this pull request as ready for review November 1, 2024 14:50

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 1, 2024

parkertimmins added :Data Management/Data streams Data streams and their lifecycles >bug and removed needs:triage Requires assignment of a team area label labels Nov 1, 2024

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Nov 1, 2024

Update docs/changelog/116031.yaml

d1b79e4

cleanup

9de4107

parkertimmins added auto-backport Automatically create backport pull requests when merged v8.16.0 v8.15.4 v8.17.0 labels Nov 1, 2024

joegallo reviewed Nov 1, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/ingest/IngestService.java Outdated Show resolved Hide resolved

joegallo reviewed Nov 1, 2024

View reviewed changes

joegallo added 2 commits November 1, 2024 14:26

Rage against the machine

b7ae76b

Merge branch 'main' into resolve-pipeline-on-lazy-rollover-write-alte…

e1a0655

…rnative

masseyke approved these changes Nov 1, 2024

View reviewed changes

parkertimmins added 2 commits November 1, 2024 15:28

Add yaml test to check that lazy rollover flag unset

38fbf23

spotless

1c7a31d

joegallo approved these changes Nov 1, 2024

View reviewed changes

parkertimmins added 2 commits November 1, 2024 16:50

Merge branch 'main' into resolve-pipeline-on-lazy-rollover-write-alte…

490cd0a

…rnative

Merge branch 'main' into resolve-pipeline-on-lazy-rollover-write-alte…

0ec9ccc

…rnative

parkertimmins merged commit 6db39d1 into elastic:main Nov 2, 2024
16 checks passed

parkertimmins deleted the resolve-pipeline-on-lazy-rollover-write-alternative branch November 2, 2024 03:54

This was referenced Nov 2, 2024

[8.16] Resolve pipelines from template if lazy rollover write (#116031) #116131

Merged

[8.x] Resolve pipelines from template if lazy rollover write (#116031) #116132

Merged

elasticsearchmachine added the backport pending label Nov 2, 2024

parkertimmins mentioned this pull request Nov 2, 2024

[8.15] Resolve pipelines from template if lazy rollover write (#116031) #116137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve pipeline on lazy rollover write #116031

Resolve pipeline on lazy rollover write #116031

parkertimmins commented Oct 31, 2024 •

edited

Loading

parkertimmins Nov 1, 2024 •

edited

Loading

parkertimmins Nov 1, 2024

parkertimmins Nov 1, 2024

masseyke Nov 1, 2024

joegallo Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

joegallo Nov 1, 2024

joegallo Nov 1, 2024

joegallo Nov 1, 2024

joegallo Nov 1, 2024

masseyke left a comment

joegallo left a comment •

edited

Loading

elasticsearchmachine commented Nov 2, 2024

parkertimmins commented Nov 2, 2024

Resolve pipeline on lazy rollover write #116031

Resolve pipeline on lazy rollover write #116031

Conversation

parkertimmins commented Oct 31, 2024 • edited Loading

parkertimmins Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masseyke left a comment

Choose a reason for hiding this comment

joegallo left a comment • edited Loading

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 2, 2024

💔 Backport failed

parkertimmins commented Nov 2, 2024

💚 All backports created successfully

Questions ?

parkertimmins commented Oct 31, 2024 •

edited

Loading

parkertimmins Nov 1, 2024 •

edited

Loading

joegallo left a comment •

edited

Loading