-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scd] Simultaneous creation of temporally proximate intents seems to cause DSS to return 404 on DELETE even when intent is present and successfully deleted #1069
Comments
Hi @callumdmay, Thanks for your report. Could you clarify a few things to help us understand the issue? Notably:
Thank you |
What DSS version are you using? What's the overall setup of your DSS pool? Setup of the pool is a single cluster on GKE, using the default Tanka config (i.e. 3 pods of each of the core-service and cockroachdb). Any chance the operational intent has expired between the time it was created and the time it was tentatively deleted? No these actions were only 2 seconds apart Are you able to reliably reproduce the issue, does it appear randomly, or did it happen only once? I was able to reproduce it with a parallel load qps above 3 QPS Do you have any logs at hand relevant to the issue? Are you able to view https://cloudlogging.app.goo.gl/9jStWPozE5c7vHkC8? Do you have a dump of the culprit request and response? Among other, the message in the 404 response could be interesting. Request:
Response:
|
Thank you for the details, @callumdmay. Regarding the request and response you provided, they don't seem to contain the payload that was sent and received from the DSS, but instead contain what seems to be metadata pertaining to the system that is issuing the requests, or through which the requests flow? The only directly relevant details they contain are the Would it be possible for you to obtain the json payload that is sent in the first PUT, as well as the json response that is received by the |
I was going to say the issue could be that the op intent might not have been at the specified OVN, but actually I think we have a bug where the specified OVN is not checked when deleting an operational intent reference; @Shastick could you track this as a separate bug fix? We should return 404 (or possibly 409) with a helpful message if the op intent exists but the OVN doesn't match. So, it seems like the issue is definitely that op intent 0a5d60ef-cbd3-46a5-9241-200ded546fba didn't exist at the time of the second request. @callumdmay are you sure there was no previous attempt to delete this operational intent? It would be extremely useful to search the DSS logs for instances of 0a5d60ef-cbd3-46a5-9241-200ded546fba between roughly 2024-08-08T19:44:27.286542Z and 2024-08-08T19:44:29.333991Z -- in addition to the two requests you've indicated above, I expect there is likely a third request to delete the operational intent reference between those two requests. |
The log search is by OID, so all requests should appear |
Are those the DSS logs or the client USS logs? I'm expecting log messages formatted like this. |
Describe the bug
During load testing of our system simulating simultaneous intent creation originating at the same point at ~ 3QPS, we observed that subsequent deletion of that intent returned 404 errors from the DSS. On observation of outbound requests from our system, we only made the Delete request once. After confirming with @BradNicolle it appears that the delete was successful inside the DSS, even though a 404 was returned.
To Reproduce
There was nothing particularly special about our load testing configuration, it was 3 worker threads make Create() and then Delete() requests at random points in 1 second intervals, simulating ~3QPS. Each intent had an initial volume overlapping at the same geographical point, but at non-overlapping times.
Expected behavior
The DSS should return a 2XX response if a deletion was successful.
DSS URL: https://dss.us.pre-qual.interuss.opensky.dev
Operator: Wing
Intent ID: "0a5d60ef-cbd3-46a5-9241-200ded546fba"
OVN: "huHqtAVBkqIgErF2peFF-uhGsCvLatVyybgfKs8dgz0_"
Outbound request timestamp: 2024-08-08 12:44:29.974325-0700
The text was updated successfully, but these errors were encountered: