Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add changelog for fallback to ILM if DSL not present #13918

Merged
merged 3 commits into from
Sep 3, 2024

Conversation

lahsivjar
Copy link
Contributor

Motivation/summary

APM-Server switched from Index Lifecycle Management(ILM) to Datastream Lifecycle (DSL) in v8.15.0. This switch was done when we moved from APM integration (which used ILM) to APM-data plugin (which uses DSL) in ES for managing APM datastreams. As a result of the switch, any old datastreams created before the switch would be Unmanaged because the datastream will never be updated with the DSL lifecycle -- this has to be done manually by using the PUT API.

Checklist

How to test these changes

  1. Create a stack (ES, Kibana, APM-Server) with data-persistence enabled for ES using 8.14.3 version. We use the 8.14.3 as that is the latest available version which uses APM integration package and thus configures ILM policies.

    Example docker-compose.yaml
    version: '3.9'
    x-logging: &default-logging
      driver: "json-file"
      options:
        max-size: "1g"
    services:
      elasticsearch:
        image: docker.elastic.co/elasticsearch/elasticsearch:8.14.3
        ports:
          - 9200:9200
        healthcheck:
          test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=500ms"]
          retries: 300
          interval: 1s
        environment:
          - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
          - "network.host=0.0.0.0"
          - "transport.host=127.0.0.1"
          - "http.host=0.0.0.0"
          - "cluster.routing.allocation.disk.threshold_enabled=false"
          - "discovery.type=single-node"
          - "xpack.security.authc.anonymous.roles=remote_monitoring_collector"
          - "xpack.security.authc.realms.file.file1.order=0"
          - "xpack.security.authc.realms.native.native1.order=1"
          - "xpack.security.enabled=true"
          - "xpack.license.self_generated.type=trial"
          - "xpack.security.authc.token.enabled=true"
          - "xpack.security.authc.api_key.enabled=true"
          - "logger.org.elasticsearch=${ES_LOG_LEVEL:-error}"
          - "action.destructive_requires_name=false"
        volumes:
          - "./testing/docker/elasticsearch/roles.yml:/usr/share/elasticsearch/config/roles.yml"
          - "./testing/docker/elasticsearch/users:/usr/share/elasticsearch/config/users"
          - "./testing/docker/elasticsearch/users_roles:/usr/share/elasticsearch/config/users_roles"
          - "./testing/docker/elasticsearch/ingest-geoip:/usr/share/elasticsearch/config/ingest-geoip"
          - "/Users/lahsivjar/Projects/elastic/tmp/esdata2:/usr/share/elasticsearch/data"
        logging: *default-logging
    
      kibana:
        image: docker.elastic.co/kibana/kibana:8.14.3
        ports:
          - 5601:5601
        healthcheck:
          test: ["CMD-SHELL", "curl -s http://localhost:5601/api/status | grep -q 'All services are available'"]
          retries: 300
          interval: 1s
        environment:
          ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]'
          ELASTICSEARCH_USERNAME: "${KIBANA_ES_USER:-kibana_system_user}"
          ELASTICSEARCH_PASSWORD: "${KIBANA_ES_PASS:-changeme}"
          XPACK_FLEET_AGENTS_ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]'
        depends_on:
          elasticsearch: { condition: service_healthy }
        volumes:
          - "./testing/docker/kibana/kibana.yml:/usr/share/kibana/config/kibana.yml"
        logging: *default-logging
    
      apm-server:
        image: docker.elastic.co/apm/apm-server:8.14.3
        ports:
          - 8200:8200
        healthcheck:
          test: ["CMD-SHELL", "bash -c 'echo -n > /dev/tcp/127.0.0.1/8200'"]
          retries: 300
          interval: 1s
        depends_on:
          elasticsearch: { condition: service_healthy }
        volumes:
          - "./testing/docker/apm-server/apm-server.yml:/usr/share/apm-server/apm-server.yml"
        logging: *default-logging
    NOTE: The config files used in the example docker-compose are [available here](https://github.com/elastic/apm-server/tree/main/testing/docker). `apm.server.yml` file used in the docker-compose could be a simple config file:
    apm-server:
      host: "0.0.0.0:8200"
    output.elasticsearch:
      hosts: ["elasticsearch:9200"]
      username: "admin"
      password: "changeme"
    logging.level: info
    logging.to_stderr: true
  2. Install the APM integration in the cluster.

  3. Send some data, for example: by using apmsoak. Example command: go run ./cmd/apmsoak/ run --file cmd/apmsoak/scenarios.yml --scenario apm-server --server-url http://localhost:8200

  4. Assert that the APM indices created are managed by ILM, for example: by running GET /_data_stream/traces-apm-default to check for trace indices

  5. Build an Elasticsearch docker image using the branch in this PR: ./gradlew buildAarch64DockerImage

  6. Update the versions used in the stack created in step 1 to 8.16.0-SNAPSHOT, for ES use the docker image built in step 5

  7. Send some more data as we did in step 3

  8. Assert that all the APM indices are still managed by ILM

  9. Rollover the datastream

  10. Assert that all the APM indices, including the one created using rollover in step 9, are still managed by ILM

Also, test if the setup works by itself i.e. if a cluster is created using the latest version (with the changes in the PR) then it works as expected and the created APM indices in this case are managed by DSL (datastream lifecycle).

NOTE: Any indices created when APM is on version 8.15.0 and datastream created before 8.15.0 i.e. with ILM, will remain Unmanaged even after this fix. To fix them, we would need to explicitly update them OR use the PUT API on datastream to set DSL.

Related issues

@lahsivjar lahsivjar requested a review from a team as a code owner August 23, 2024 11:37
@lahsivjar lahsivjar added the backport-8.15 Automated backport with mergify label Aug 23, 2024
changelogs/8.15.asciidoc Outdated Show resolved Hide resolved
@inge4pres inge4pres mentioned this pull request Sep 2, 2024
8 tasks
Copy link
Contributor

mergify bot commented Sep 2, 2024

This pull request is now in conflicts. Could you fix it @lahsivjar? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b ilmdlm-chglog upstream/ilmdlm-chglog
git merge upstream/8.15
git push upstream ilmdlm-chglog

Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog target branch should be main, with backport-8.15 label

@carsonip carsonip changed the base branch from 8.15 to main September 3, 2024 09:24
@carsonip carsonip changed the base branch from main to 8.15 September 3, 2024 09:24
@carsonip
Copy link
Member

carsonip commented Sep 3, 2024

Simply change target branch to main creates a mess. At this point i think it is just easier to manually rebase onto main, then change target branch to main.

@lahsivjar
Copy link
Contributor Author

Changelog target branch should be main, with backport-8.15 label

It is not in this case because the fix for main has been reverted and it might require a different changelog. Anyway, this PR needs a bit more updates after the recent changes in the fix. Will get this cleaned up.

@carsonip
Copy link
Member

carsonip commented Sep 3, 2024

It is not in this case because the fix for main has been reverted and it might require a different changelog. Anyway, this PR needs a bit more updates after the recent changes in the fix. Will get this cleaned up.

I believe the 8.15.asciidoc needs to be updated in both main branch and 8.15 branch, hence targeting main and backporting to 8.15. It has nothing to do with a separate bugfix for main.

@lahsivjar lahsivjar changed the base branch from 8.15 to main September 3, 2024 12:56
@lahsivjar lahsivjar requested review from a team and simitt September 3, 2024 12:56
simitt
simitt previously approved these changes Sep 3, 2024
changelogs/8.15.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Carson Ip <[email protected]>
Copy link
Contributor

@inge4pres inge4pres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks 🙏🏼

@lahsivjar lahsivjar merged commit de642ac into elastic:main Sep 3, 2024
7 checks passed
mergify bot pushed a commit that referenced this pull request Sep 3, 2024
@lahsivjar lahsivjar deleted the ilmdlm-chglog branch September 3, 2024 13:46
mergify bot added a commit that referenced this pull request Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants