Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dss_probing suite sometimes incorrectly indicates missing subscriber #517

Closed
BenjaminPelletier opened this issue Feb 22, 2024 · 2 comments
Closed
Assignees
Labels
automated-testing Related to automated testing tools bug Something isn't working P1 High priority test-scenario-behavior

Comments

@BenjaminPelletier
Copy link
Member

This issue is based on this CI failure for a PR that does nothing except bump a minor version of a Python module dependency. Relevant artifacts copied here:

Test check
All Subscription[i] 1≤i≤n returned in subscribers

Difference from expected behavior
The All Subscription[i] 1≤i≤n returned in subscribers check in S11 failed because, when ISA 38a38c20 was deleted (event 84), the relevant subscription 4cce563e (created in event 70 and included in this ISA's subscribers when the ISA was created in event 73) was not included in the subscribers list. This appears to be a bug (but isn't; see below) as this subscription was included in subscribers when 38a38c20 was created just one step earlier, and that version of 38a38c20 is what was deleted in S11. Yet, no subscribers were included in the deletion response of event 84. I don't see any change in the subscription in the local_infra logs in between the ISA creation (2024-02-21T20:44:31.342242964Z) and deletion (2024-02-21T20:44:36.495131935Z).

This would appear to indicate DSS misbehavior. However, note that there is a 5+ second gap in between ISA creation and deletion requests (creation 2024-02-21T20:44:31.342242964Z and deletion 2024-02-21T20:44:36.495131935Z). This means that the wall clock time for the deletion request was after the end of the subscription end time. So, the root problem appears to be that the test assumes the subscription must be present even after its end time. The inciting problem is that there is a long pause in test execution. I'm not sure why the long pause happens (that would be good to understand), but regardless, the test can be improved to extend the length of the subscription and/or verify that the ISA deletion request completed before the end time of the subscription before claiming non-compliance to the requirement.

@BenjaminPelletier BenjaminPelletier added bug Something isn't working automated-testing Related to automated testing tools P1 High priority test-scenario-behavior labels Feb 22, 2024
@Shastick
Copy link
Contributor

Shastick commented Feb 22, 2024

Thanks for investigating this: it seems to be what was causing some stray failures on #512 and #514, I'll pick this up tomorrow.

Update:

As you described, the issue is probably caused by the way the scenario checks that expired subscriptions don't cause notifications. The way the scenario is designed is likely sensible to random API delays: the 5 second gaps we see are too be expected (see below).

I'll try to tweak the subscription durations and wait times as a first fix.

Update2:

Extending some durations works, but does not address the root cause:

In a discussion with @mickmis we also evoked the possibility that the notification queries might be causing the problem. It turns out that these have a read timeout of 5 seconds, and we seem to try to notify example.com. This causes a delay that breaks the scenario's assumptions.

Notes

5 seconds pause

The inciting problem is that there is a long pause in test execution. I'm not sure why the long pause happens (that would be good to understand), but regardless, the test can be improved to extend the length of the subscription and/or verify that the ISA deletion request completed before the end time of the subscription before claiming non-compliance to the requirement.

The 5 seconds pause we see is likely part of the test design: for example, the description for step9() states:

Expired ISA automatically removed, ISA modifications accessible from all non-primary DSSs

And the first action of this step is to call:

        sleep(
            SHORT_WAIT_SEC,
            "ISA_1 needs to expire so we can check it is automatically removed",
        )

With SHORT_WAIT_SEC being 5 seconds.

In the same step we also create subscriptions that have a duration of 5 seconds.

@Shastick
Copy link
Contributor

I believe this has been properly handled by #518: closing, we can re-open if this reappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated-testing Related to automated testing tools bug Something isn't working P1 High priority test-scenario-behavior
Projects
None yet
Development

No branches or pull requests

2 participants