Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry obtaining KUBECONFIG file lock #3782

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 18 additions & 6 deletions pkg/cluster/internal/kubeconfig/internal/kubeconfig/lock.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ package kubeconfig
import (
"os"
"path/filepath"
"time"
)

// these are from
const lockFileRetryAttemps = 5

// these are based on
// https://github.com/kubernetes/client-go/blob/611184f7c43ae2d520727f01d49620c7ed33412d/tools/clientcmd/loader.go#L439-L440

func lockFile(filename string) error {
Expand All @@ -32,12 +35,21 @@ func lockFile(filename string) error {
return err
}
}
f, err := os.OpenFile(lockName(filename), os.O_CREATE|os.O_EXCL, 0)
if err != nil {
return err

// Retry obtaining the file lock a few times to accommodate concurrent operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is safe, if there is a lock we're racing with another process anyhow? (which may logically conflict) and we're deviating in behavior from client-go

I think we probably need to retry the entire kubeconfig edit call instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be the safe route. Once we obtain the lock, there shouldn't be any concerns about other client-go (or other libraries) that also use file locks to update the KUBECONFIG. All of the update operations happen between that window of obtaining the file lock and removing it.

When we update the contents, we do both read and write within this locked section.

Same with removing an entry. The file lock is obtained first, and only then are updates made.

The only difference from client-go is it will immediately fail if another process already has that lock. We will be a little more robust and just retry, where typical operations should be complete within milliseconds.

So I believe the only risk if there is another process that is not using the same file locking mechanism to control access to KUBECONFIG. As long is it operates under the same principle of getting the lock before reading and updating, then the only difference is we might be able to wait until the other process is done before completely failing the operation.

var lastErr error
for i := 0; i < lockFileRetryAttemps; i++ {
f, err := os.OpenFile(lockName(filename), os.O_CREATE|os.O_EXCL, 0)
if err == nil {
f.Close()
return nil
}

lastErr = err
time.Sleep(100 * time.Millisecond)
}
f.Close()
return nil

return lastErr
}

func unlockFile(filename string) error {
Expand Down
Loading