-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/openshift-install/gather: Recognize "connection refused" #3615
cmd/openshift-install/gather: Recognize "connection refused" #3615
Conversation
cmd/openshift-install/gather.go
Outdated
if errno, ok := errors.Unwrap(err).(syscall.Errno); ok && errno == syscall.ECONNREFUSED { | ||
return errors.Wrap(err, "failed to connect to the bootstrap machine") | ||
} else if len(gatherBootstrapOpts.sshKeys) == 0 { | ||
return errors.Wrap(err, "failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's better use to the Is
function introduced in go1.13, see Overview
in https://golang.org/pkg/errors/#Unwrap
github.com/pkg/errors
which we use in installer is transparent replacement for std lib errors
.
107b9bc
to
4e0f4cf
Compare
@@ -145,8 +146,14 @@ func logGatherBootstrap(bootstrap string, port int, masters []string, directory | |||
if err != nil && strings.Contains(err.Error(), "ssh: handshake failed: ssh: unable to authenticate") { | |||
return errors.Wrap(err, "failed to create SSH client, ensure the private key is added to your authentication agent (ssh-agent) or specified with the --key parameter") | |||
} else if err != nil { | |||
if errors.Is(err, syscall.ECONNREFUSED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this entire block https://github.com/openshift/installer/pull/3615/files#diff-be4790dcfa83ca14e7942b4c3b50c5f9R146-R156 has multi if else for err, can we standardize to using errors.Is for connectionrefused and unable to authenitcate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried checking for the root error type for the unable to authenticate scenario but the root error is fmt.Errorf which is a basic string error. Not sure if errors.Is will work for that scenario. I checked their test files for the function and the strings.Contains function is being used. Will work to make the code a little cleaner.
Picking this as a victim PR to |
4e0f4cf
to
e51cdd7
Compare
cmd/openshift-install/gather.go
Outdated
if strings.Contains(err.Error(), "ssh: handshake failed: ssh: unable to authenticate") { | ||
return errors.Wrap(err, "failed to create SSH client, ensure the private key is added to your authentication agent (ssh-agent) or specified with the --key parameter") | ||
} else if errors.Is(err, syscall.ECONNREFUSED) { | ||
return errors.Wrap(err, "failed to connect to the bootstrap machine") | ||
} else if len(gatherBootstrapOpts.sshKeys) == 0 { | ||
return errors.Wrap(err, "failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think it doesn't have to be if else if, it can be written as blocks of if that return.
-
https://github.com/golang/crypto/blob/master/ssh/client_auth.go#L75 , https://github.com/golang/crypto/blob/master/ssh/client.go#L85 is where the authentication error is coming from and it doesn't look like it supports Unwrap so strings contains will have to do for now.
-
for unauthenticated error there are 2 cases,
- the env was set and https://github.com/openshift/installer/blob/master/pkg/gather/ssh/agent.go#L19
here maybe we can say failed to authenticate with the pre-existing ssh agent, add key to agent - the keys were used
here we can say failed to authenticate using keys, add you should provide keys using --keys
or something like that.
-
len(gatherBootstrapOpts.sshKeys) == 0
I don't know what this case helps us with. -
https://github.com/openshift/installer/blob/master/pkg/gather/ssh/agent.go#L19
uses ssh agent when no keys are provided, but that doesn't match the doc in https://github.com/openshift/installer/blob/master/docs/user/troubleshootingbootstrap.md#authenticating-to-bootstrap-host-for-ipi
also b91707e#diff-be4790dcfa83ca14e7942b4c3b50c5f9 made it so that there is always one key passed to this list making it this ssh agent not used at all.. that would be bad.. maybe we fix that too, drop thelen(keys) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(gatherBootstrapOpts.sshKeys) == 0
I don't know what this case helps us with
I added that case in 00307d1 and I think it is a good idea to remove it in this PR. The generated SSH key solves the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for unauthenticated error there are 2 cases,
Determining which case happened seems to be a little hard based off of only the error message. https://github.com/openshift/installer/blob/master/pkg/gather/ssh/agent.go#L19 checks if the env is set and if it fails to dial, it goes with the other way of pulling the ssh keys from local. Any error message returned after that is in a general way and does not give details as to what case happened. I may be wrong.
The way I see it, I need to check if the env is set and if it dials right again like the above link, in this place here
installer/cmd/openshift-install/gather.go
Line 145 in e2367c4
if err != nil && strings.Contains(err.Error(), "ssh: handshake failed: ssh: unable to authenticate") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function that has the information what was used to create the SSH Agent for authentication is the ssh.NewClient
and i think we should look at how that function that provide us the context on which one was used so that we can decide the next steps when we fail to authenticate.
maybe we try that?
Fixes #3188 |
e51cdd7
to
b84c448
Compare
pkg/gather/ssh/ssh.go
Outdated
@@ -36,6 +37,12 @@ func NewClient(user, address string, keys []string) (*ssh.Client, error) { | |||
HostKeyCallback: ssh.InsecureIgnoreHostKey(), | |||
}) | |||
if err != nil { | |||
if strings.Contains(err.Error(), "ssh: handshake failed: ssh: unable to authenticate") { | |||
if agentType == "agent" { | |||
return nil, errors.Wrap(err, "failed to use pre-existing agent, try adding keys using the --keys option") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, errors.Wrap(err, "failed to use pre-existing agent, try adding keys using the --keys option") | |
return nil, errors.Wrap(err, "failed to use pre-existing agent, make sure the appropriate keys exist in the agent for authentication") |
pkg/gather/ssh/ssh.go
Outdated
if agentType == "agent" { | ||
return nil, errors.Wrap(err, "failed to use pre-existing agent, try adding keys using the --keys option") | ||
} | ||
return nil, errors.Wrap(err, "failed to use keys, ensure the private key is added to your authentication agent (ssh-agent) or specified with the --key parameter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, errors.Wrap(err, "failed to use keys, ensure the private key is added to your authentication agent (ssh-agent) or specified with the --key parameter") | |
return nil, errors.Wrap(err, "failed to use the provided keys for authentication") |
Before this commit, bootstrap machines that failed to come up would look like [1]: level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-6266tp8r-77109.origin-ci-int-aws.dev.rhcloud.com:6443..." level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get https://api.ci-op-6266tp8r-77109.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusteroperators: dial tcp 3.221.214.197:6443: connect: connection refused" level=info msg="Pulling debug logs from the bootstrap machine" level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: dial tcp 3.84.188.207:22: connect: connection refused" level=fatal msg="Bootstrap failed to complete: waiting for Kubernetes API: context deadline exceeded" With this commit, that last error will look like: level=error msg="Attempted to gather debug logs after installation failure: failed to connect to the bootstrap machine: dial tcp 3.84.188.207:22: connect: connection refused" without the unrelated (to this failure mode) distraction about SSH keys. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12076 Updated the commit to match with the latest changes.
b84c448
to
97bb8a8
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
the failing test is an e2e, known broken https://coreos.slack.com/archives/CNHC2DK2M/p1591025728087100 /override ci/prow/e2e-aws |
@abhinavdahiya: Overrode contexts on behalf of abhinavdahiya: ci/prow/e2e-aws In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@rna-afk: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Before this commit, bootstrap machines that failed to come up would
look like 1:
With this commit, that last error will look like:
without the unrelated (to this failure mode) distraction about SSH
keys.
Updated the commit to match with the latest changes.
Clone of #2810