-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[plugin/apm-data] Set fallback to legacy ILM policies #112028
Conversation
[For reviewers] I was planning to add integration tests to validate the behavior with ILM<>DSL but since APM-data plugin has ALWAYS used DSL the tests will not be possible. Previously, the ILM policies were installed using the APM integration but I don't think it would be a good idea to somehow hack the integration installation in tests -- Ideas/suggestions are welcomed. Another point for discussion is that I have not added ILM policies in this PR. My reasoning for this is that the policies would be present in the cluster if they have been upgraded from the older version and we can use that. If the policies are not present then it means that the integration was not installed and we would be good with using DSL anyway. Let me know if my reasoning here is incorrect or lacking. |
Pinging @elastic/es-data-management (Team:Data Management) |
Is it possible in the test to manually create an ILM and an index using it? It adds some duplication with the functionality provided by the integration but is not something we will need to change/update. |
What I wrote here was due to a bad testing. I did not have persistence enabled, so starting the version of ES from this PR would produce new data streams and appear as to "fix" the issue. In practice with persistence enabled I reproduced what we were expecting. Thanks Vishal for the guidance. Moved the previous content here to avoid confusion: DetailsI completed the test as described. The only difference I found is on step 8 Assert that all the APM indices are still managed by ILM. On step 4 Assert that the APM indices created are managed by ILM: GET /_data_stream/traces-apm-default
{
"index_name": ".ds-traces-apm-default-2024.08.21-000001",
"index_uuid": "YrMcukDWTymjeiA7qpB_sw",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
} After starting ES built from this PR, the indices were already managed by DKM. This is the result for some of them (but they all show the same: GET /_data_stream/logs-apm.error-default
{
"index_name": ".ds-traces-apm-default-2024.08.21-000001",
"index_uuid": "blw7b56sShKFI2ACmCrfdQ",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Data stream lifecycle"
} GET /_data_stream/metrics-apm.internal-default
{
"index_name": ".ds-metrics-apm.internal-default-2024.08.21-000001",
"index_uuid": "6hZE1gH4S6qlwGFgT3itdQ",
"prefer_ilm": false,
"ilm_policy": "metrics-apm.internal_metrics-default_policy",
"managed_by": "Data stream lifecycle"
} GET GET /_data_stream/logs-apm.error-default
{
"index_name": ".ds-logs-apm.error-default-2024.08.21-000001",
"index_uuid": "EzIN-JmfSsGlnsAa8tpA7w",
"prefer_ilm": false,
"ilm_policy": "logs-apm.error_logs-default_policy",
"managed_by": "Data stream lifecycle"
} After rollover we see 2 indices, both managed by DLM: POST /logs-apm.error-default/_rollover/
GET /_data_stream/logs-apm.error-default
{
"index_name": ".ds-logs-apm.error-default-2024.08.21-000001",
"index_uuid": "EzIN-JmfSsGlnsAa8tpA7w",
"prefer_ilm": false,
"ilm_policy": "logs-apm.error_logs-default_policy",
"managed_by": "Data stream lifecycle"
},
{
"index_name": ".ds-logs-apm.error-default-2024.08.21-000002",
"index_uuid": "NqGsBZbbTNKghyYQZh8Cpw",
"prefer_ilm": false,
"ilm_policy": "logs-apm.error_logs-default_policy",
"managed_by": "Data stream lifecycle"
} The same happens with multiple rollovers: POST /traces-apm-default/_rollover/
POST /traces-apm-default/_rollover/
{
"index_name": ".ds-traces-apm-default-2024.08.21-000001",
"index_uuid": "blw7b56sShKFI2ACmCrfdQ",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Data stream lifecycle"
},
{
"index_name": ".ds-traces-apm-default-2024.08.21-000002",
"index_uuid": "eHUB_aNtSvK1DCbBYUZNHg",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Data stream lifecycle"
},
{
"index_name": ".ds-traces-apm-default-2024.08.21-000003",
"index_uuid": "2cRTD22pTHKeAau1DqR_hg",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Data stream lifecycle"
} |
Hmm, this is unexpected. If a datastream is created in a prior version then it should not be managed by Datastream Lifecycle -- IIUC, this is what is causing the issue in the first place. I wonder if the persistence didn't work as expected causing your new setup to use DSL from the get-go? |
Updated my previous comment, further testing revealed a persistence issue in my testing as mentioned by Vishal. TIL that Dev Console content is not persisted in Elasticsearch but through localstorage/cookies, so that is not a reliable indicator for correctly working persistence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests covered:
- from 8.14.3 to 8.16.0-SNAPSHOT (with this PR applied) with data streams created in 8.14.3. Some data streams were manually rolled over.
- from 8.14.3 to 8.15.0 and then to 8.16.0-SNAPSHOT (with this PR applied) with data streams created in 8.14.3. Some data streams were manually rolled over and some did it automatically upon reaching 8.16.0-SNAPSHOT.
Data streams created in 8.15.0, as mentioned, did not receive any update to their ILM policy and required a manual API call (as documented). Once DSL was applied, previously Unmanaged data streams were updated to use DSL.
This reverts commit fd37ef8.
…ic#112112) * Revert "[plugin/apm-data] Set fallback to legacy ILM policies (elastic#112028)" This reverts commit fd37ef8.
Fixes fallback to legacy ILM policies when a datastream is updated. Without this PR, post update the indexes would be unmanaged without any lifecycle. After this PR:
v8.15.0
) would be managed by datastream lifecyclev8.15.0
) and migrated to version on or afterv8.15.0
would be managed by ILM policies until they are explicitly migrated to use DLM.The PR doesn't add the lifecycle policies as when/if they are required then they should be available via the previous apm-integration.
Testing locally
Create a stack (ES, Kibana, APM-Server) with data-persistence enabled for ES using
8.14.3
version. We use the8.14.3
as that is the latest available version which uses APM integration package and thus configures ILM policies.Example docker-compose.yaml
Install the APM integration in the cluster.
Send some data, for example: by using apmsoak. Example command:
go run ./cmd/apmsoak/ run --file cmd/apmsoak/scenarios.yml --scenario apm-server --server-url http://localhost:8200
Assert that the APM indices created are managed by ILM, for example: by running
GET /_data_stream/traces-apm-default
to check for trace indicesBuild an Elasticsearch docker image using the branch in this PR:
./gradlew buildAarch64DockerImage
Update the versions used in the stack created in step 1 to
8.16.0-SNAPSHOT
, for ES use the docker image built in step 5Send some more data as we did in step 3
Assert that all the APM indices are still managed by ILM
Rollover the datastream
Assert that all the APM indices, including the one created using rollover in step 9, are still managed by ILM
Also, test if the setup works by itself i.e. if a cluster is created using the latest version (with the changes in the PR) then it works as expected and the created APM indices in this case are managed by DSL (datastream lifecycle).
NOTE: Any indices created when APM is on version
8.15.0
and datastream created before8.15.0
i.e. with ILM, will remainUnmanaged
even after this fix. To fix them, we would need to explicitly update them OR use the PUT API on datastream to set DSL.Fixes: elastic/apm-server#13898