-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functional tests for ILM & Data Stream Lifecycle #14100
Comments
I've been thinking how to implement functional tests. I explored 3 possible approaches:
TL;DR: my conclusion is that we may want to proceed with option 3. Option 1 heavy leverages ESS, creating infrastructure using terraform, on top of a custom built Bash framework. We do not need leveraging ESS for functional testing and the framework looks built with that purpose. Moreover Bash flexibility comes at a great readability costs. Orchestration is also not clear to me, and documentation is lacking. Option 2 uses Go tests, but is strictly focused on testing specific APM server behaviors. As we now rely on A third option would be to build a framework that looks similar to Option 2, with guarantees provided by Go, but with a scope more similar to Option 1. I created a simple stub in https://github.com/endorama/apm-server/blob/af3ca3744e4746d4c6d7f65a162927d6c9e19331/functionaltests/main_test.go#L31 using
The third option looks the best to me if I think at future use cases (es adding further cases to this logic #13678 is more difficult in Bash than in Go), and I think a test framework must be ergonomic enough to encourage use. I see potential for convergence in the long run, but is out of scope and not sure how much weight should have in the decision. |
The upgrade part is going to be especially tricky IMO, because IIANM Elasticsearch will never allow an upgrade between a released version (e.g. 8.15.3) and an un-released one (SNAPSHOT). |
Can we use BCs in cloud first region? I'm not sure is possible to upgrade to those though. |
I'll recap the discussions from today about how to move forward. I discussed this with @axw, @1pkg and @inge4pres. My current stub uses Leveraging current smoke tests does not look the preferred path forward, as they are mostly Bash + CI and this greatly limits both expressiveness sin tests and reproducibility. The current proposal would be to implement a new testing framework built on these principals:
This approach would go towards the convergence mentioned in my previous comment, and not using We will also have to consider how to run tests in parallel, some tests may taint the Elasticsearch stack used in a way that does not make safe reusing it for other tests, and some may not. Is not clear how to address this in our design at the moment, but for efficiency would be interesting to be able to mark clusters as tainted for further tests reuse. Regarding which tests cases to run, we have a set already mentioned in #13898 (comment) that we should include. |
As per our latest discussion, I created a stub of a first test on the new framework we discussed. You can see it here: https://github.com/endorama/apm-server/blob/3b4ec398e8715b9b61ede38cb84aa5928d241492/testing/functional/test1/main_test.go The first test I'm aiming for is testing the upgrade path from 8.14.0 to 8.15.1 as defined by:
I also added a README to clarify the overall idea. |
Moving this to the next iteration, @endorama please update this issue with where you are at with the functional tests and the goals for this iteration. |
Adding this as a possible future path to evaluate: leveraging https://pkg.go.dev/github.com/rogpeppe/go-internal/testscript (example in use: https://github.com/kortschak/dex/blob/master/testdata/worklog_load_postgres.txt https://github.com/kortschak/dex/blob/master/main_test.go#L127). This would allow us to "script" our test leveraging CLI tools and go code to augment functionalities, with a defined test runner and test structure. We don't have to take a decision now, as our plan is to write Go code we can integrate it later on. The advantage I see are:
This solution has been suggested by our colleague Dan Kortschak (not citing him to avoid subscribing to the issue). /cc @axw |
I would be happy to see a spike on testscript-based functional tests, I have also advocated for that in the past, but I don't think anyone ever gave it a shot in apm-server. |
I prepared the first test in this draft PR #14935 This implementation leverages:
Iteration speed is slow on this test as a full run could easily take 20 to 30 minutes. Overall I did not find major roadblocks. I have to investigate a timeout in applying the upgrade to 8.16.0 through Terraform, as there is not why to change such timeout at the moment. It happened only once so far, so I've yet to drawn a conclusion on how problematic this may be. |
We should implement functional tests that verify ILM or Data Stream Lefecycle Management is used as expected according to user configuration, and depending on whether a cluster is created new or upgraded. See #13898 (comment)
The text was updated successfully, but these errors were encountered: