Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make nodepool concurrent ops scale better #12488

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Xylosma
Copy link

@Xylosma Xylosma commented Dec 4, 2024

The purpose of this change is to stagger operation creations, but still keeps operations concurrent, so that operations are not bottlenecked on creation and fail.
To achieve the above,

  • a cluster-scope lock is held for operation creation
  • cluster-scope lock is released after creation is done
  • cluster-scope read lock is held while waiting for operation to complete (as before)

Release Note Template for Downstream PRs (will be copied)

See Write release notes for guidance.

container: make nodepool concurrent operations scale better

Xylosma and others added 2 commits November 26, 2024 23:38
This prevents Terraform from making massive number of requests at the
same time.
@github-actions github-actions bot requested a review from ScottSuarez December 4, 2024 16:15
Copy link

github-actions bot commented Dec 4, 2024

Hello! I am a robot. Tests will require approval from a repository maintainer to run.

@ScottSuarez, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

@modular-magician modular-magician added awaiting-approval Pull requests that needs reviewer's approval to run presubmit tests service/container and removed awaiting-approval Pull requests that needs reviewer's approval to run presubmit tests labels Dec 4, 2024
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 2 files changed, 124 insertions(+), 11 deletions(-))
google-beta provider: Diff ( 2 files changed, 124 insertions(+), 11 deletions(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 219
Passed tests: 207
Skipped tests: 12
Affected tests: 0

Click here to see the affected service packages
  • container

🟢 All tests passed!

View the build log

Copy link
Contributor

@ScottSuarez ScottSuarez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look to upstream this locking behavior to retryWhileIncompatibleOperation.

This resource already has custom locking behavior on these calls. If we want to customize that behavior we should customize that function. That would mean passing that operation back and doing the wait within that function rather then introducing a secondary locking mechanism. Otherwise this will be difficult to interpret and new additions will also need to remember to implement this same behavior you have here.

https://github.com/hashicorp/terraform-provider-google-beta/blob/d02bfdd327ae6fbef0aa744c8c4c51c3abd3ba11/google-beta/services/container/resource_container_node_pool.go#L1693

@github-actions github-actions bot requested a review from ScottSuarez December 11, 2024 21:50
@modular-magician modular-magician added awaiting-approval Pull requests that needs reviewer's approval to run presubmit tests and removed awaiting-approval Pull requests that needs reviewer's approval to run presubmit tests labels Dec 11, 2024
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 2 files changed, 144 insertions(+), 174 deletions(-))
google-beta provider: Diff ( 2 files changed, 144 insertions(+), 174 deletions(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 219
Passed tests: 207
Skipped tests: 12
Affected tests: 0

Click here to see the affected service packages
  • container

🟢 All tests passed!

View the build log

Copy link
Contributor

@ScottSuarez ScottSuarez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to sync over a gvc and talk about this a bit!

Comment on lines +1681 to +1692
func retryWhileIncompatibleOperation(timeout time.Duration, lockKey string, clusterLockKey string, createOpFunc func() (*container.Operation, error), waitOpFunc func(*container.Operation) error) error {
f := func() error {
transport_tpg.MutexStore.Lock(clusterLockKey)
op, err := createOpFunc()
transport_tpg.MutexStore.Unlock(clusterLockKey)
if err != nil {
return err
}
transport_tpg.MutexStore.RLock(clusterLockKey)
defer transport_tpg.MutexStore.RUnlock(clusterLockKey)
return waitOpFunc(op)
}
Copy link
Contributor

@ScottSuarez ScottSuarez Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't believe this to actually be changing anything after going through the behavior loop in my head. LockedCall(createLockKey, ...) holds the lock for the entire operation (both creation and waiting). This means the additional locking/unlocking with waitLockKey inside the operation serves no purpose, as the outer lock already prevents concurrent execution.

Also, I don't believe concurrent updates are supported in the provider. all operations right now are synchronous

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the goal is that we don't want multiple nodepools within the same cluster to create operations at exactly the same time. Currently, users may trigger creation of multiple (a lot of) nodepools for a single cluster using terraform and they will be created at the exact same time, overwhelming the GKE pipeline. The nodepool lock is fine, because we want sequential creation at cluster level.
Also happy to chat more on GVC.

Copy link
Contributor

@ScottSuarez ScottSuarez Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I understand it a bit better after drawing a diagram. I worry a bit about the readability of this code given that need. I'm okay approving this as is, but could someone from your team also review because the maintenance will fall more broadly there.

sequenceDiagram
    participant Caller
    participant Retry
    participant LockedCall
    participant MutexStore
    participant CreateOp as createOpFunc
    participant WaitOp as waitOpFunc

    Caller->>Retry: retryWhileIncompatibleOperation(timeout, npLockKey, clusterLockKey, ...)
    
    activate Retry
    Note over Retry: Start retry loop with timeout
    
    Retry->>LockedCall: LockedCall(npLockKey, f)
    activate LockedCall
    
    LockedCall->>MutexStore: Lock(npLockKey)
    Note over MutexStore: Write lock on nodepool
    
    LockedCall->>MutexStore: Lock(clusterLockKey)
    Note over MutexStore: Write lock on cluster
    
    LockedCall->>CreateOp: createOpFunc()
    activate CreateOp
    CreateOp-->>LockedCall: Return operation
    deactivate CreateOp
    
    LockedCall->>MutexStore: Unlock(clusterLockKey)
    Note over MutexStore: Release cluster write lock
    
    LockedCall->>MutexStore: RLock(clusterLockKey)
    Note over MutexStore: Read lock on cluster
    
    LockedCall->>WaitOp: waitOpFunc(op)
    activate WaitOp
    WaitOp-->>LockedCall: Return result
    deactivate WaitOp
    
    LockedCall->>MutexStore: RUnlock(clusterLockKey)
    Note over MutexStore: Release cluster read lock
    
    LockedCall->>MutexStore: Unlock(npLockKey)
    Note over MutexStore: Release nodepool lock (via defer)
    
    alt Operation succeeded
        LockedCall-->>Retry: Return nil
        Retry-->>Caller: Return nil
    else IsFailedPreconditionError or IsQuotaError
        LockedCall-->>Retry: Return error
        Note over Retry: Retry operation
        Retry->>LockedCall: LockedCall(npLockKey, f)
    else Other error
        LockedCall-->>Retry: Return error
        Retry-->>Caller: Return error immediately
    end
    
    deactivate LockedCall
    deactivate Retry
Loading

Comment on lines +1675 to +1681
// Retries an operation while the canonical error code is FAILED_PRECONDTION or RESOURCE_EXHAUSTED which indicates
// there is an incompatible operation already running on the cluster or there are the number of allowed concurrent
// operations running on the cluster. These errors can be safely retried until the incompatible operation completes,
// and the newly requested operation can begin.
// The cluster lock is held during createOpFunc to make opeation creations sequencial, and cluster read lock is held
// during waitOpFunc to allow concurrency.
func retryWhileIncompatibleOperation(timeout time.Duration, lockKey string, clusterLockKey string, createOpFunc func() (*container.Operation, error), waitOpFunc func(*container.Operation) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Retries an operation while the canonical error code is FAILED_PRECONDTION or RESOURCE_EXHAUSTED which indicates
// there is an incompatible operation already running on the cluster or there are the number of allowed concurrent
// operations running on the cluster. These errors can be safely retried until the incompatible operation completes,
// and the newly requested operation can begin.
// The cluster lock is held during createOpFunc to make opeation creations sequencial, and cluster read lock is held
// during waitOpFunc to allow concurrency.
func retryWhileIncompatibleOperation(timeout time.Duration, lockKey string, clusterLockKey string, createOpFunc func() (*container.Operation, error), waitOpFunc func(*container.Operation) error) error {
// Retries an operation while the canonical error code is FAILED_PRECONDTION or RESOURCE_EXHAUSTED which indicates
// there is an incompatible operation already running on the cluster or there are the number of allowed concurrent
// operations running on the cluster. These errors can be safely retried until the incompatible operation completes,
// and the newly requested operation can begin.
// The npLockKey is held during createOpFunc to make operations sequential within the node pool, and
// clusterLockKey is held during waitOpFunc to allow concurrency on a cluster.
func retryWhileIncompatibleOperation(timeout time.Duration, npLockKey string, clusterLockKey string, createOpFunc func() (*container.Operation, error), waitOpFunc func(*container.Operation) error) error {

Comment on lines +1681 to +1692
func retryWhileIncompatibleOperation(timeout time.Duration, lockKey string, clusterLockKey string, createOpFunc func() (*container.Operation, error), waitOpFunc func(*container.Operation) error) error {
f := func() error {
transport_tpg.MutexStore.Lock(clusterLockKey)
op, err := createOpFunc()
transport_tpg.MutexStore.Unlock(clusterLockKey)
if err != nil {
return err
}
transport_tpg.MutexStore.RLock(clusterLockKey)
defer transport_tpg.MutexStore.RUnlock(clusterLockKey)
return waitOpFunc(op)
}
Copy link
Contributor

@ScottSuarez ScottSuarez Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I understand it a bit better after drawing a diagram. I worry a bit about the readability of this code given that need. I'm okay approving this as is, but could someone from your team also review because the maintenance will fall more broadly there.

sequenceDiagram
    participant Caller
    participant Retry
    participant LockedCall
    participant MutexStore
    participant CreateOp as createOpFunc
    participant WaitOp as waitOpFunc

    Caller->>Retry: retryWhileIncompatibleOperation(timeout, npLockKey, clusterLockKey, ...)
    
    activate Retry
    Note over Retry: Start retry loop with timeout
    
    Retry->>LockedCall: LockedCall(npLockKey, f)
    activate LockedCall
    
    LockedCall->>MutexStore: Lock(npLockKey)
    Note over MutexStore: Write lock on nodepool
    
    LockedCall->>MutexStore: Lock(clusterLockKey)
    Note over MutexStore: Write lock on cluster
    
    LockedCall->>CreateOp: createOpFunc()
    activate CreateOp
    CreateOp-->>LockedCall: Return operation
    deactivate CreateOp
    
    LockedCall->>MutexStore: Unlock(clusterLockKey)
    Note over MutexStore: Release cluster write lock
    
    LockedCall->>MutexStore: RLock(clusterLockKey)
    Note over MutexStore: Read lock on cluster
    
    LockedCall->>WaitOp: waitOpFunc(op)
    activate WaitOp
    WaitOp-->>LockedCall: Return result
    deactivate WaitOp
    
    LockedCall->>MutexStore: RUnlock(clusterLockKey)
    Note over MutexStore: Release cluster read lock
    
    LockedCall->>MutexStore: Unlock(npLockKey)
    Note over MutexStore: Release nodepool lock (via defer)
    
    alt Operation succeeded
        LockedCall-->>Retry: Return nil
        Retry-->>Caller: Return nil
    else IsFailedPreconditionError or IsQuotaError
        LockedCall-->>Retry: Return error
        Note over Retry: Retry operation
        Retry->>LockedCall: LockedCall(npLockKey, f)
    else Other error
        LockedCall-->>Retry: Return error
        Retry-->>Caller: Return error immediately
    end
    
    deactivate LockedCall
    deactivate Retry
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants