Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource offer cleanup sometimes removes the wrong offers #467

Open
bgins opened this issue Dec 10, 2024 · 0 comments · May be fixed by #475
Open

Resource offer cleanup sometimes removes the wrong offers #467

bgins opened this issue Dec 10, 2024 · 0 comments · May be fixed by #475
Assignees
Labels
bug Something isn't working resource-provider solver

Comments

@bgins
Copy link
Contributor

bgins commented Dec 10, 2024

General Description

In our current implementation, the solver removes a resource offer when a resource provider disconnects:

func (solverServer *solverServer) disconnectCB(connParams http.WSConnectionParams) {
if connParams.Type == "ResourceProvider" {
metricsDashboard.TrackNodeConnectionEvent(metricsDashboard.NodeConnectionParams{
Event: "Disconnect",
ID: connParams.ID,
CountryCode: connParams.CountryCode,
IP: connParams.IP,
})
solverServer.controller.removeResourceOfferByResourceProvider(connParams.ID)
}
}

The removeResourceOfferByResourceProvider implementation removes a single resource offer:

err = controller.store.RemoveResourceOffer(resourceOffers[0].ID)
if err != nil {
return err
}

We store the resource offers in a map:

resourceOfferMap map[string]*data.ResourceOfferContainer

When a job is complete, we mark the resource offer with a ResultsAccepted[3] state, but we do not remove it from the map.

Maps are unordered in Go, and a result removeResourceOfferByResourceProvider removes a random resource offer which may or may not be the active, unmatched resource offer.

In addition, resource providers may submit multiple resource offers. Cleaning up a single resource offer may not be enough in general.

Which system(s) or functionality does this affect

The changes to fix this issue will affect the solver and resource providers.

Describe the changes, and how this affects/ interacts with each system

We can fix this issue by removing all resource offers in a DealNegotiating[0] state. These are the resource offers that are unmatched, and we would like to remove them so they do not get matched when a resource provider is not connected to the solver.

We should avoid removing offers in any other state. Offers in a DealAgreed[1] and ResultsSubmitted[2] are in flight and we should be especially careful to preserve them.

@bgins bgins added bug Something isn't working solver resource-provider labels Dec 10, 2024
@bgins bgins self-assigned this Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resource-provider solver
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant