Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[plugin/apm-data]Fallback to ILM for non-dsl-only mode if DSL unavailable #112759

Merged
merged 17 commits into from
Sep 16, 2024

Conversation

lahsivjar
Copy link
Contributor

@lahsivjar lahsivjar commented Sep 11, 2024

An alternative PR to #112615 and #112614. With this PR, If a component or index template is named ending @ilm then the file is NOT loaded for datastream-only mode.

Related to

Closes: #112137
Closes: elastic/apm-server#13898

@lahsivjar lahsivjar added >bug :Data Management/Data streams Data streams and their lifecycles labels Sep 11, 2024
@lahsivjar lahsivjar requested a review from a team as a code owner September 11, 2024 17:51
@lahsivjar lahsivjar requested review from axw and gmarouli September 11, 2024 17:51
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team v9.0.0 labels Sep 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@lahsivjar lahsivjar force-pushed the fallback-to-ilm-gracefully branch from 36d2923 to 21ee4f2 Compare September 11, 2024 17:56
@lahsivjar lahsivjar changed the title Fallback to ilm gracefully [plugin/apm-data]Fallback to ILM for non-dsl-only mode if DSL unavailable Sep 11, 2024
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like this a lot more than having override files and spreading things across plugins. Looks great.

Should we add YAML REST tests to verify the ILM fallback works correctly?

@lahsivjar
Copy link
Contributor Author

lahsivjar commented Sep 12, 2024

Should we add YAML REST tests to verify the ILM fallback works correctly?

@axw Added test for checking if ILM files are installed in normal mode. I wanted to also add tests for dsl-only-mode set to true but that didn't work since the setting is not exported I think so the test cluster fails to start.

@axw
Copy link
Member

axw commented Sep 13, 2024

@lahsivjar what's there looks good. I'd really like to see functional tests that confirm the ILM policies are effective, not just that the component templates exist (checking that the backing indices are managed by ILM would be good enough); but this can also be done in a followup.

@lahsivjar
Copy link
Contributor Author

(checking that the backing indices are managed by ILM would be good enough); but this can also be done in a followup

@axw This is a bit more complicated. If there were an older version of the plugin that had ILM it would have been easier, for testing it at the current state we would have to emulate some of the APM integration templates and ILMs... not impossible but would take a bit more effort. I will look into it as a followup.

@gmarouli gmarouli added v8.16.0 auto-backport Automatically create backport pull requests when merged labels Sep 13, 2024
Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lahsivjar thanks for working on this! I was thinking what if we perform this filtering in the APMIndexTemplateRegistry? Because this is not yet a well-established solution, but a way to address this in the APM case, I think it would be better to keep it localised in APMIndexTemplateRegistry. Maybe we override componentTemplates in that class. What do you think?

@lahsivjar
Copy link
Contributor Author

I was thinking what if we perform this filtering in the APMIndexTemplateRegistry? Because this is not yet a well-established solution, but a way to address this in the APM case, I think it would be better to keep it localised in APMIndexTemplateRegistry.

The YamlTemplateRegistry seems to be created from the apm-data plugin when the otel-data plugin was introduced (both are owned by the intake-services team) so, at least for now, it is only used by our plugins. However, I think it makes sense to have the interface be more generic.

Maybe we override componentTemplates in that class. What do you think?

How about we allow overriding for shouldLoadTemplate method? The YamlTemplateRegistry can always return true but inheritors of YamlTemplateRegistry can override it to filter out some of the templates conditionally as required.

@lahsivjar
Copy link
Contributor Author

lahsivjar commented Sep 13, 2024

@gmarouli I have pushed a fix for scoping logic within the APM plugin (commit ref). It is based on overriding shouldLoadTemplate method I proposed above, let me know if it still doesn't look good - will replace it with overriding the component template methods

@gmarouli
Copy link
Contributor

gmarouli commented Sep 16, 2024

How about we allow overriding for shouldLoadTemplate method? The YamlTemplateRegistry can always return true but inheritors of YamlTemplateRegistry can override it to filter out some of the templates conditionally as required.

I really like this idea!

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your proposed solution of providing a filter to YamlTemplateRegstry; however, I find the current way a bit conflicting.

  • It has a generic name shouldLoadTemplate which is based on the name of the template but then it also needs a boolean (the dslOnlyMode) which feels a bit random.
  • If we would extract the dslOnlyMode and apply it inlined in the map, we might be limiting the scope of the shouldLoadTemplate filter.

So, what if we pass the predicate in the constructor:

public YamlTemplateRegistry(
        Settings nodeSettings,
        ClusterService clusterService,
        ThreadPool threadPool,
        Client client,
        NamedXContentRegistry xContentRegistry,
        FeatureService featureService
    ) {
        this(nodeSettings, clusterService, threadPool, client, xContentRegistry, featureService, ignored -> true);
    }
    
......
public YamlTemplateRegistry(
        Settings nodeSettings,
        ClusterService clusterService,
        ThreadPool threadPool,
        Client client,
        NamedXContentRegistry xContentRegistry,
        FeatureService featureService,
        Predicate<String> templateFilter
    ) {
    .....
    componentTemplates = Optional.ofNullable(componentTemplateNames)
                .orElse(Collections.emptyList())
                .stream()
                .map(o -> (String) o)
                .filter(templateFilter)
                .collect(Collectors.toMap(name -> name, name -> loadComponentTemplate(name, version)));
                .......
}

And in ApmIndexTemplateRegistry:

public APMIndexTemplateRegistry(
        Settings nodeSettings,
        ClusterService clusterService,
        ThreadPool threadPool,
        Client client,
        NamedXContentRegistry xContentRegistry,
        FeatureService featureService
    ) {
        super(
            nodeSettings,
            clusterService,
            threadPool,
            client,
            xContentRegistry,
            featureService,
            templateName -> DataStreamLifecycle.isDataStreamsLifecycleOnlyMode(clusterService.getSettings()) == false
                || templateName.endsWith("@ilm") == false
        );
    }

This way the filter is generic enough to be anything we want and the definition of the filter is fully the responsibility of the subclass which allows it to fulfil a variety of use cases.

What do you think?

@lahsivjar lahsivjar requested a review from gmarouli September 16, 2024 09:16
@lahsivjar
Copy link
Contributor Author

What do you think?

@gmarouli Makes a lot of sense to me, I have updated the PR. Should be ready for another review!

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lahsivjar , some small comments and then it's good to go! Great work!

Copy link
Contributor Author

@lahsivjar lahsivjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Thanks for the detailed review.

@lahsivjar lahsivjar requested a review from gmarouli September 16, 2024 11:37
@gmarouli
Copy link
Contributor

@elasticmachine update branch

Updating with main to see if this will fix the ci errors

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming ci is happy, LGTM :) Thank you for brining this home @lahsivjar , great work!

@lahsivjar lahsivjar merged commit efdd0c3 into elastic:main Sep 16, 2024
16 checks passed
@lahsivjar lahsivjar deleted the fallback-to-ilm-gracefully branch September 16, 2024 13:39
@gmarouli
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team test-update-serverless v8.16.0 v9.0.0
Projects
None yet
5 participants