Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smoketest: fix healthcheck to wait for the resource to be ready #13359

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

kyungeunni
Copy link
Contributor

Motivation/summary

Healthcheck for smoke test parses API responses body to confirm that the APM Server is up and running before sending events. Currently, the condition is always true even when the .publish_ready isn't set to true, which makes the health check to be successful even the resource is not ready.

example of the case:

-> Waiting for 70 seconds for the APM Server to become available...
{"ok":false,"message":"The requested resource is currently unavailable."}
-> APM Server ready!
-> Sending events to APM Server...
{"ok":false,"message":"The requested resource is currently unavailable."}

The response didn't contain .publish_ready but the healthcheck passed.

This PR fixes the check to wait for the readiness correctly.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

The successful response look like this:

{
  "build_date": "2024-06-06T00:00:00Z",
  "build_sha": "redacted",
  "publish_ready": true,
  "version": "8.13.0"
}

run this snippet script(copied from the lib.sh) locally with different RES:

healthcheck() {
    local PUBLISH_READY=$(echo ${RES}| jq '.publish_ready')
    if [[ ${PUBLISH_READY} != true ]]; then
        local MAX_RETRIES=10
        if [[ ${1} -gt 0 ]] && [[ ${1} -lt ${MAX_RETRIES} ]]; then
            echo "-> APM Server isn't ready to receive events, retrying (${1}/${MAX_RETRIES})..."
            # sleep $((1 * ${1}))
            # healthcheck $((1 + ${1}))
            return
        else
            echo "-> APM Server isn't ready to receive events, maximum retries exceeded"
            exit 1
        fi
    else
        echo "-> APM Server ready!"
    fi
}

healthcheck

It should print followings
❯ RES="{ }" ./test-healthcheck.sh
-> APM Server isn't ready to receive events, maximum retries exceeded

❯ RES="{"publish_ready": true }" ./test-healthcheck.sh
-> APM Server ready!

❯ RES="{"publish_ready": false }" ./test-healthcheck.sh
-> APM Server isn't ready to receive events, maximum retries exceeded

Related issues

@kyungeunni kyungeunni requested a review from a team as a code owner June 7, 2024 11:29
Copy link
Contributor

mergify bot commented Jun 7, 2024

This pull request does not have a backport label. Could you fix it @kyungeunni? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Jun 7, 2024
@kyungeunni kyungeunni changed the title fix: change condition to include publish_ready: null smoketest: fix healthcheck to wait for the resource to be ready Jun 7, 2024
carsonip
carsonip previously approved these changes Jun 7, 2024
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, thanks!

testing/smoke/lib.sh Outdated Show resolved Hide resolved
@kyungeunni kyungeunni enabled auto-merge (squash) June 11, 2024 04:31
@kyungeunni kyungeunni requested review from a team, carsonip and endorama June 11, 2024 04:33
@kyungeunni kyungeunni merged commit a407f01 into elastic:main Jun 11, 2024
10 checks passed
@kyungeunni kyungeunni deleted the fix-healthcheck branch June 11, 2024 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants