Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix TestAuthenticationAndHBARules stress tests SSL issue #132729

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

souravcrl
Copy link
Contributor

@souravcrl souravcrl commented Oct 16, 2024

sql: fix TestAuthenticationAndHBARules stress tests SSL issue

informs #131532
informs #131110
informs #130253
informs #127745
Epic CRDB-41958

TestAuthenticationAndHBARules fails for special_cases, hba_default_equivalence, empty_hba data driven tests for secure mode. The failures occur when root user is trying to authenticate with cert-password auth method and sslmode is set to verify-ca with sslcert being empty. The expected behavior is root authentication defaults to password method and fails as no password is set for root, but instead we get:

pq: SSL is not enabled on the server

Since the failures are there only under stress, it might be because db server shutdown or paused before responding to request for upgrade connection to SSL from lib/pq client from here
https://github.com/lib/pq/blob/3d613208bca2e74f2a20e04126ed30bcb5c4cc27/conn.go#L1116-L1130. Retrying connection establishment when this specific error is obtained might fix the problem as this logic seems faulty(it checks for absence of 'S' in server response whereas the correct check should be for 'N' in response).


pgwire: add test build logs for upgrade secure conn

Additional test build logs are added to verify the step for which
maybeUpgradeToSecureConn fails.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@souravcrl souravcrl added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 labels Oct 16, 2024
@souravcrl souravcrl marked this pull request as ready for review October 16, 2024 15:21
@souravcrl souravcrl requested review from a team as code owners October 16, 2024 15:21
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pritesh-lahoti and @souravcrl)


pkg/sql/pgwire/auth_test.go line 575 at r1 (raw file):

					}
					if err != nil {
						if !strings.Contains(err.Error(), "SSL is not enabled on the server") {

let's use errors.Is(err, pq.ErrSSLNotSupported) rather than a string match

https://github.com/lib/pq/blob/3d613208bca2e74f2a20e04126ed30bcb5c4cc27/conn.go#L34


pkg/sql/pgwire/auth_test.go line 579 at r1 (raw file):

						}
						// Retry: server might fail to respond to upgrade conn under stress
						dbSQL, err = gosql.Open("postgres", dsn)

hmm i'd like to understand a little bit more about how this might help before merging. let me dive into this a little bit first.

Copy link
Contributor Author

@souravcrl souravcrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pritesh-lahoti and @rafiss)


pkg/sql/pgwire/auth_test.go line 575 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

let's use errors.Is(err, pq.ErrSSLNotSupported) rather than a string match

https://github.com/lib/pq/blob/3d613208bca2e74f2a20e04126ed30bcb5c4cc27/conn.go#L34

done


pkg/sql/pgwire/auth_test.go line 579 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

hmm i'd like to understand a little bit more about how this might help before merging. let me dive into this a little bit first.

have taken a note of this. will await your reply.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pritesh-lahoti and @souravcrl)


pkg/sql/pgwire/auth_test.go line 579 at r1 (raw file):

Previously, souravcrl wrote…

have taken a note of this. will await your reply.

i think we should try to gather more info on the failure before going with a blind retry. (unless, have you already done some of this investigation?)

i think a patch like this will help us see exactly which step of upgrading the secure conn is not working:

diff --git a/pkg/sql/pgwire/pre_serve.go b/pkg/sql/pgwire/pre_serve.go
index 66d1d81f86c..d0bca6747c1 100644
--- a/pkg/sql/pgwire/pre_serve.go
+++ b/pkg/sql/pgwire/pre_serve.go
@@ -23,6 +23,7 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgerror"
 	"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgwirebase"
 	"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgwirecancel"
+	"github.com/cockroachdb/cockroach/pkg/util/buildutil"
 	"github.com/cockroachdb/cockroach/pkg/util/log"
 	"github.com/cockroachdb/cockroach/pkg/util/metric"
 	"github.com/cockroachdb/cockroach/pkg/util/mon"
@@ -446,6 +447,10 @@ func (s *PreServeConnHandler) maybeUpgradeToSecureConn(
 		// Secure mode: disallow if TCP and the user did not opt into
 		// non-TLS SQL conns.
 		if !s.cfg.AcceptSQLWithoutTLS && connType != hba.ConnLocal && connType != hba.ConnInternalLoopback {
+			// Extra logs under test to debug TestAuthenticationAndHBARules.
+			if buildutil.CrdbTestBuild {
+				log.Warningf(ctx, "client cannot connect since version=%d AcceptSQLWithoutTLS=false and connType=%s", version, connType)
+			}
 			clientErr = pgerror.New(pgcode.ProtocolViolation, ErrSSLRequired)
 		}
 		return
@@ -458,6 +463,10 @@ func (s *PreServeConnHandler) maybeUpgradeToSecureConn(
 		// we don't want it.
 		clientErr = pgerror.New(pgcode.ProtocolViolation,
 			"cannot use SSL/TLS over local connections")
+		// Extra logs under test to debug TestAuthenticationAndHBARules.
+		if buildutil.CrdbTestBuild {
+			log.Warningf(ctx, "client cannot connect since version=%d and connType=%s", version, connType)
+		}
 		return
 	}
 
@@ -479,6 +488,10 @@ func (s *PreServeConnHandler) maybeUpgradeToSecureConn(
 	if tlsConfig == nil {
 		// We don't have a TLS configuration available, so we can't honor
 		// the client's request.
+		// Extra logs under test to debug TestAuthenticationAndHBARules.
+		if buildutil.CrdbTestBuild {
+			log.Infof(ctx, "sending sslUnsupported message to client")
+		}
 		n, serverErr = conn.Write(sslUnsupported)
 		if serverErr != nil {
 			return

we can merge it, and then next time it fails we should have a helpful clue. what do you think about this?

@souravcrl souravcrl removed the request for review from a team October 19, 2024 14:38
informs cockroachdb#131532
informs cockroachdb#131110
informs cockroachdb#130253
informs cockroachdb#127745
Epic CRDB-41958

TestAuthenticationAndHBARules fails for `special_cases`,
`hba_default_equivalence`, `empty_hba` data driven tests for secure mode. The
failures occur when root user is trying to authenticate with cert-password auth
method and `sslmode` is set to `verify-ca` with `sslcert` being empty. The
expected behavior is root authentication defaults to password method and fails
as no password is set for root, but instead we get:
```
SSL is not enabled on the server
```
Since the failures are there only under stress, it might be because db server
shutdown or paused before responding to request for upgrade connection to SSL
from lib/pq client from here
https://github.com/lib/pq/blob/3d613208bca2e74f2a20e04126ed30bcb5c4cc27/conn.go#L1116-L1130.
Retrying connection establishment when this specific error is obtained might fix
the problem as this logic seems faulty(it checks for absence of 'S' in server
response whereas the correct check should be for 'N' in response).

Release note: None
Epic CRDB-41958

Additional test build logs are added to verify the step for which
`maybeUpgradeToSecureConn` fails for `TestAuthenticationAndHBARules`.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants