Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1493353: Connector does not work inside of Snowpark Containers #978

Closed
akamor opened this issue Jun 20, 2024 · 6 comments
Closed

SNOW-1493353: Connector does not work inside of Snowpark Containers #978

akamor opened this issue Jun 20, 2024 · 6 comments
Assignees
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team

Comments

@akamor
Copy link

akamor commented Jun 20, 2024

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of .NET driver are you using?
    3.1.0

  2. What operating system and processor architecture are you using?
    Ubuntu

  3. What version of .NET framework are you using?
    net8

  4. What did you do?
    The snowflake data connector does not work inside a dotnet container running in SPCS. We are unable to connect to snowflake from the container. Using analogous code from Python works perfectly fine.

Here is a quick repo:

 var host = Environment.GetEnvironmentVariable("SNOWFLAKE_HOST")!;
        var account = Environment.GetEnvironmentVariable("SNOWFLAKE_ACCOUNT")!;
        var authenticator = "oauth";
        var token = File.ReadAllText("/snowflake/session/token").Trim();


        var builder = new SnowflakeDbConnectionStringBuilder
        {
            { "account", account },
            { "host", host },
            { "token", token },
            { "authenticator", authenticator },
        };

        var connection = new SnowflakeDbConnection(builder.ConnectionString);
        connection.Open();

We hang on the Open() call. The stack trace is:

[2024-06-20T20:20:02+00:00 ERR Microsoft.AspNetCore.Server.Kestrel] Connection id "0HN4HCMAM5FQQ", Request id "0HN4HCMAM5FQQ:00000001": An unhandled exception was thrown by the application.
Snowflake.Data.Client.SnowflakeDbException (0x80004005): Error: Snowflake Internal Error: Unable to connect. One or more errors occurred. (A task was canceled.) SqlState: 08006, VendorCode: 270001, QueryId: 
 ---> System.AggregateException: One or more errors occurred. (A task was canceled.)
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Threading.Tasks.Task.GetExceptions(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Snowflake.Data.Core.RestRequester.Post[T](IRestRequest request)
   at Snowflake.Data.Core.Authenticator.BaseAuthenticator.Login()
   at Snowflake.Data.Core.Authenticator.OAuthAuthenticator.Authenticate()
   at Snowflake.Data.Core.SFSession.Open()
   at Snowflake.Data.Core.Session.SessionPool.NewSession(String connectionString, SecureString password)
   at Snowflake.Data.Core.Session.SessionPool.GetSession(String connStr, SecureString password)
   at Snowflake.Data.Core.Session.ConnectionCacheManager.GetSession(String connectionString, SecureString password)
   at Snowflake.Data.Client.SnowflakeDbConnectionPool.GetSession(String connectionString, SecureString password)
   at Snowflake.Data.Client.SnowflakeDbConnection.Open()

Let me re-iterate that the same code, using the identical connection string works fine in a Python container constructed in SPCS in an identical way.

@akamor akamor added the bug label Jun 20, 2024
@github-actions github-actions bot changed the title Connector does not work inside of Snowpark Containers SNOW-1493353: Connector does not work inside of Snowpark Containers Jun 20, 2024
@akamor
Copy link
Author

akamor commented Jun 20, 2024

Sorry, that stack trace only happens once I finally cancel the request (hence the TaskCanceledException). If I don't manually intervene then it'll keep trying to connect for a very long time (10+min as of last count).

I've noticed though that whenever I cancel the stack trace is the same and we are always in the OAuthAuthenticator.Authenticate() call so I suspect the issue is there.

@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Jun 21, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Jun 21, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi - thanks for raising this. right now checking if there's any difference when same app is run from the container outside of SPCS and inside it. thank you for bearing with me while troubleshooting!

@sfc-gh-dszmolka
Copy link
Contributor

looks to be working for me. Steps I followed:

  1. using example from https://github.com/Snowflake-Labs/sf-samples/tree/main/samples/spcs/sf-connection/dotnet with following modifications:
    a., TRACE loglevel in log4net.config
    b., minimally modified App.cs to be:
using System;
using System.Data;
using System.Data.Common;
using System.IO;

using Snowflake.Data.Core;
using Snowflake.Data.Client;

class App
{
    const string TOKENPATH = "/snowflake/session/token";

    static string getConnectionString(){
        string? account = Environment.GetEnvironmentVariable("SNOWFLAKE_ACCOUNT");
        string? database = Environment.GetEnvironmentVariable("SNOWFLAKE_DATABASE");
        string? schema = Environment.GetEnvironmentVariable("SNOWFLAKE_SCHEMA");
        string? host = Environment.GetEnvironmentVariable("SNOWFLAKE_HOST");
	Console.WriteLine("Variables auto-populated from environment. account: " + account + ", host: " + host + ", database: " + database + ", schema: " + schema + ".");
        if (File.Exists(TOKENPATH)) {
            // automatically set by env
            string token = File.ReadAllText(TOKENPATH);
            Console.WriteLine("/snowflake/session/token mounted, reading token: " + token);
            Console.WriteLine("Attempting authenticating with this oauth token");
            return $"account={account};authenticator=oauth;token={token};db={database};schema={schema};host={host};insecuremode=true";
        } else {
            // basic auth, variables must be set by user
            Console.WriteLine("Uhh, we dont seem to have a OAuth token from the environment here! Attempting authenticating with user/pass.");
            string? user = Environment.GetEnvironmentVariable("SNOWFLAKE_USER");
            string? password = Environment.GetEnvironmentVariable("SNOWFLAKE_PASSWORD");
            return $"account={account};user={user};password={password};db={database};schema={schema};host={host};insecuremode=true";
        }
    }

    static int Main()
    {
        try
        {
            using (IDbConnection conn = new SnowflakeDbConnection())
            {
                conn.ConnectionString = getConnectionString();
                conn.Open();
                using (IDbCommand cmd = conn.CreateCommand())
                {
                    cmd.CommandText = "SELECT current_user() AS user";
                    IDataReader reader = cmd.ExecuteReader();
                    while (reader.Read())
                    {
                      Console.WriteLine(reader.GetString(0));
                    }
		    Console.WriteLine("SUCCESS");
                }
		conn.Close();
		return 0;
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.ToString());
        }
	return 1;
    }
}

c., modified the csproj to include driver version 3.1.0 instead of default 2.1.4
2. build the container, tag it, push it to your SPCS image repository
3. run it as a Job

DROP SERVICE IF EXISTS dotnet_in_spcs_test;
EXECUTE JOB SERVICE
  IN COMPUTE POOL tutorial_compute_pool
  NAME='dotnet_in_spcs_test'
  FROM SPECIFICATION $$
    spec:
      containers:
      - name: main
        image: /net978/public/imagerepo/sf-example-connect-dotnet:3.1.0
    $$ --did not populate containers.[0].env with SNOWFLAKE_USER and SNOWFLAKE_PASSWORD as it is not required for oauth
;
  1. get the container logs after Job runs to completion, with SELECT SYSTEM$GET_SERVICE_LOGS('dotnet_in_spcs_test', 0, 'main');
Variables auto-populated from environment. account: MYACCOUNT, host: snowflake.eu-central-1.snowflakecomputing.com, database: NET978, schema: PUBLIC.
/snowflake/session/token mounted, reading token: ver:1-hint:63..4-ETM..token..K
Attempting authenticating with this oauth token
2024-06-21 10:42:15,154 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnection [(null)] - Open Connection.
2024-06-21 10:42:15,164 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnectionPool [(null)] - SnowflakeDbConnectionPool::GetSession
2024-06-21 10:42:15,164 [1] DEBUG Snowflake.Data.Core.Session.SessionPool [(null)] - SessionPool::GetSession
2024-06-21 10:42:15,165 [1] DEBUG Snowflake.Data.Core.Session.SessionPool [(null)] - SessionPool::GetIdleSession
2024-06-21 10:42:15,165 [1] DEBUG Snowflake.Data.Core.Session.SessionPool [(null)] - SessionPool::NewSession
2024-06-21 10:42:15,169 [1] INFO  Snowflake.Data.Core.SFSessionProperties [(null)] - Start parsing connection string.
2024-06-21 10:42:15,178 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property PORT set to default value: 443
2024-06-21 10:42:15,178 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property SCHEME set to default value: https
2024-06-21 10:42:15,179 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property USER set to default value: 
2024-06-21 10:42:15,179 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property CONNECTION_TIMEOUT set to default value: 300
2024-06-21 10:42:15,179 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property VALIDATE_DEFAULT_PARAMETERS set to default value: true
2024-06-21 10:42:15,179 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property USEPROXY set to default value: false
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property DISABLERETRY set to default value: false
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property FORCERETRYON404 set to default value: false
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property CLIENT_SESSION_KEEP_ALIVE set to default value: false
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property FORCEPARSEERROR set to default value: false
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property BROWSER_RESPONSE_TIMEOUT set to default value: 120
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property RETRY_TIMEOUT set to default value: 300
2024-06-21 10:42:15,180 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property MAXHTTPRETRIES set to default value: 7
2024-06-21 10:42:15,181 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property INCLUDERETRYREASON set to default value: true
2024-06-21 10:42:15,181 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property DISABLEQUERYCONTEXTCACHE set to default value: false
2024-06-21 10:42:15,181 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property DISABLE_CONSOLE_LOGIN set to default value: true
2024-06-21 10:42:15,181 [1] DEBUG Snowflake.Data.Core.SFSessionProperties [(null)] - Sesssion property ALLOWUNDERSCORESINHOST set to default value: false
2024-06-21 10:42:15,186 [1] DEBUG Snowflake.Data.Core.HttpUtil [(null)] - Http client not registered. Adding.
2024-06-21 10:42:15,190 [1] INFO  Snowflake.Data.Core.EasyLoggingStarter [(null)] - Attempting to enable easy logging without a config file specified from connection string
2024-06-21 10:42:15,191 [1] DEBUG Snowflake.Data.Core.SFSession [(null)] - Open Session
2024-06-21 10:42:15,345 [.NET TP Worker] DEBUG Snowflake.Data.Core.RestRequester [(null)] - Executing:  POST https://snowflake.eu-central-1.snowflakecomputing.com/session/v1/login-request?warehouse=&databaseName=NET978&schemaName=PUBLIC&roleName=&requestId=7f3..ab4d&request_guid=5b..27 HTTP/1.1
2024-06-21 10:42:15,348 [.NET TP Worker] DEBUG Snowflake.Data.Core.HttpUtil+RetryHandler [(null)] - Http request timeout : 00:00:16
2024-06-21 10:42:15,348 [.NET TP Worker] DEBUG Snowflake.Data.Core.HttpUtil+RetryHandler [(null)] - Rest request timeout : 00:05:00
2024-06-21 10:42:15,878 [.NET TP Worker] DEBUG Snowflake.Data.Core.HttpUtil+RetryHandler [(null)] - Success Response: StatusCode: 200, ReasonPhrase: 'OK'
2024-06-21 10:42:15,879 [.NET TP Worker] DEBUG Snowflake.Data.Core.RestRequester [(null)] - Succeeded Response:  POST https://snowflake.eu-central-1.snowflakecomputing.com/session/v1/login-request?warehouse=&databaseName=NET978&schemaName=PUBLIC&roleName=&requestId=7f3..ab4d&request_guid=5b..27
2024-06-21 10:42:15,919 [1] DEBUG Snowflake.Data.Core.SFSession [(null)] - Update parameter map
2024-06-21 10:42:15,921 [1] DEBUG Snowflake.Data.Core.SFSession [(null)] - Session opened: 63..0
2024-06-21 10:42:15,922 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnection [(null)] - Connection open with pooled session: 63..0
2024-06-21 10:42:15,923 [1] DEBUG Snowflake.Data.Client.SnowflakeDbCommand [(null)] - Constructing SnowflakeDbCommand class
2024-06-21 10:42:15,924 [1] DEBUG Snowflake.Data.Client.SnowflakeDbCommand [(null)] - ExecuteDbDataReader
2024-06-21 10:42:15,925 [1] DEBUG Snowflake.Data.Core.SFStatement [(null)] - Trimmed query : SELECT current_user() AS user
2024-06-21 10:42:15,933 [.NET TP Worker] DEBUG Snowflake.Data.Core.RestRequester [(null)] - Executing: 63..0 POST https://snowflake.eu-central-1.snowflakecomputing.com/queries/v1/query-request?requestId=10..6&request_guid=5..4&clientStartTime=1718.. HTTP/1.1
2024-06-21 10:42:15,933 [.NET TP Worker] DEBUG Snowflake.Data.Core.HttpUtil+RetryHandler [(null)] - Http request timeout : -00:00:00.0010000
2024-06-21 10:42:15,933 [.NET TP Worker] DEBUG Snowflake.Data.Core.HttpUtil+RetryHandler [(null)] - Rest request timeout : -00:00:00.0010000
2024-06-21 10:42:16,067 [.NET TP Worker] DEBUG Snowflake.Data.Core.HttpUtil+RetryHandler [(null)] - Success Response: StatusCode: 200, ReasonPhrase: 'OK'
2024-06-21 10:42:16,067 [.NET TP Worker] DEBUG Snowflake.Data.Core.RestRequester [(null)] - Succeeded Response: 63..0 POST https://snowflake.eu-central-1.snowflakecomputing.com/queries/v1/query-request?requestId=10..66&request_guid=5a..4&clientStartTime=17..7
2024-06-21 10:42:16,088 [1] DEBUG Snowflake.Data.Core.SFSession [(null)] - Update parameter map
2024-06-21 10:42:16,089 [1] DEBUG Snowflake.Data.Core.QueryContextCache [(null)] - Adding new QCC item with either id nor priority found in cache id = 0, priority = 0
2024-06-21 10:42:16,090 [1] DEBUG Snowflake.Data.Core.QueryContextCache [(null)] - syncPriorityMap called priorityMap size = 0, newPrioirtyMap size = 1
2024-06-21 10:42:16,090 [1] DEBUG Snowflake.Data.Core.QueryContextCache [(null)] - checkCacheCapacity() called. cacheSet size 1 cache capacity 5
2024-06-21 10:42:16,090 [1] DEBUG Snowflake.Data.Core.QueryContextCache [(null)] - checkCacheCapacity() returns. cacheSet size 1 cache capacity 5
2024-06-21 10:42:16,091 [1] DEBUG Snowflake.Data.Core.QueryContextCache [(null)] - Cache Entry: id: 0 readTimestamp: 17..4 priority: 0
SF$SERVICE$w..g
SUCCESS
2024-06-21 10:42:16,097 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnection [(null)] - Close Connection.
2024-06-21 10:42:16,098 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnectionPool [(null)] - SnowflakeDbConnectionPool::GetPooling
2024-06-21 10:42:16,098 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnectionPool [(null)] - SnowflakeDbConnectionPool::GetPooling
2024-06-21 10:42:16,098 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnectionPool [(null)] - SnowflakeDbConnectionPool::AddSession
2024-06-21 10:42:16,099 [1] DEBUG Snowflake.Data.Core.Session.SessionPool [(null)] - SessionPool::AddSession
2024-06-21 10:42:16,099 [1] DEBUG Snowflake.Data.Core.Session.SessionPool [(null)] - pool connection with sid 63..0
2024-06-21 10:42:16,099 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnection [(null)] - Session pooled: 63..0
2024-06-21 10:42:16,099 [1] DEBUG Snowflake.Data.Client.SnowflakeDbConnection [(null)] - Close Connection.

Works.

Can you please check and see if it works for you too and what might be the differences in the working and non-working setup ?

First i would recommend adding insecuremode=true to your connection string without changing anything else, and seeing if this change alone helps or not. If not, I recommend trying with the test application above (with TRACE loglevel)

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Jun 21, 2024
@akamor
Copy link
Author

akamor commented Jun 21, 2024

The issue seems to be insecuremode=true. When that is added to the connection string things work fine, otherwise the error continues.

Do you have an idea as to why that would happen?

@sfc-gh-dszmolka
Copy link
Contributor

okay, this is good news, now you should be unblocked. thank you for testing.
we'll need to dig further why exactly the cert validation fails inside SPCS; perhaps has to do something with the unique way .NET driver does it (CRL, still vs. OCSP the rest of Snowflake drivers incl. PythonConnector)

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage Issue is under initial triage and removed status-triage_done Initial triage done, will be further handled by the driver team labels Jun 22, 2024
@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Jun 24, 2024

yes it was the CRL

.NET driver is different from all the other Snowflake drivers in terms of how it validates the certificates it encounters while setting up the TLS channel. Still using CRL, while all the others are using OCSP.

edited for posterity to include error pattern too to help future readers.
So with TRACE loglevel issue is clearly seen from driver logs:

 ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid because of errors in the certificate chain: RevocationStatusUnknown, OfflineRevocation

Per Microsoft documentation for certificate status flags:

RevocationStatusUnknown:
Specifies that it is not possible to determine whether the certificate has been revoked. This can be due to the certificate revocation list (CRL) being offline or unavailable.

OfflineRevocation:
Specifies that the online certificate revocation list (CRL) the X509 chain relies on is currently offline.

so CRL is unavailable, thus fails verification, and this is very likely the issue.

So did the following to make it work:

  1. retrieved all the certificates from the chain which is offered during TLS setup from the LB endpoint when connecting to myaccount.eu-central-1.snowflakcomputing.com
  2. enumerated all the CRL endpoints with something like
$ for i in {1..4}; do openssl x509 -in cert$i.pem  -text -noout | grep -A4 "X509v3 CRL Distribution Points"; done
            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://crl.r2m03.amazontrust.com/r2m03.crl

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://crl.rootca1.amazontrust.com/rootca1.crl

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://crl.rootg2.amazontrust.com/rootg2.crl

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI:http://s.ss2.us/r.crl
  1. created a network rule which allows the above CRL endpoints, and external access integration (EAI) referencing the network rule
CREATE NETWORK RULE allow_crls_eucentral1
  TYPE = HOST_PORT
  MODE = EGRESS
  VALUE_LIST = ('crl.r2m03.amazontrust.com:80', 'crl.rootca1.amazontrust.com:80', 'crl.rootg2.amazontrust.com:80', 's.ss2.us:80');

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION allow_crls_eucentral1_int
  ALLOWED_NETWORK_RULES = (allow_crls_eucentral1)
  ENABLED = true;
  1. created the SPCS service with an extra field I did not use previously; specifying the EAI which allows the Service to talk to the CRL endpoints:
drop service if exists dotnet_in_spcs_test;
EXECUTE JOB SERVICE
  IN COMPUTE POOL tutorial_compute_pool
  NAME='dotnet_in_spcs_test'
  EXTERNAL_ACCESS_INTEGRATIONS = ( allow_crls_eucentral1_int ) --> this wasn't here previously
  FROM SPECIFICATION $$
    spec:
      containers:
      - name: main
        image: /net978/public/imagerepo/sf-example-connect-dotnet:3.1.0
        env:
          SNOWFLAKE_INSECUREMODE: false
    $$ --did not populate containers[0].env with SNOWFLAKE_USER and SNOWFLAKE_PASSWORD as it is not required for 
oauth
--there's a logic inside App.cs which either uses insecuremode=true or insecuremode=false based on the value of externally-supplied envvar SNOWFLAKE_INSECUREMODE
;

the connection is now successful even with insecuremode=false (which is the default, if one does not speficy insecuremode in the connection string) and Job runs to completion after successfully connecting to Snowflake.

Makes sorta-kinda sense, since as we document, external network access (== access to the CRL servers) is forbidden by default unless explicitly allowed by EAI attached to Service. Same issue happens even outside of SPCS if your network policies / firewall block access to the above CRL endpoints.

I don't think this is a driver issue or bug per se; just a matter of configuration. For the future, we do have plans to implement the other kind of certificate validation (OCSP) in the .NET driver too, making it on par with the other drivers.
Marking this one as closed as it has been resolved but do comment please if you need further assistance and I'll keep looking.

@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team and removed bug status-triage Issue is under initial triage labels Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants