Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1896153: Non deterministic errors with concurrent context aware queries in v1.12.1 #1292

Open
niger-prequel opened this issue Jan 24, 2025 · 1 comment
Assignees
Labels
bug Erroneous or unexpected behaviour status-triage Issue is under initial triage

Comments

@niger-prequel
Copy link

  1. What version of GO driver are you using?

1.12.1 and 1.12.0

  1. What operating system and processor architecture are you using?

Debian Bullseye x86

  1. What version of GO are you using?

go1.23.3

  1. Server version:* E.g. 1.90.1

9.1.0

  1. What did you do?

We are loading data into snowflake via the golang data. Our strategy is to use PUT... SQL commands to upload parquet files to an internal stage. then we use a COPY INTO... statement to publish the data. We leverage the database/sql golang abstraction . So generally our code will look like

package example

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "sync"
    "time"

    _ "github.com/snowflakedb/gosnowflake" // Snowflake driver
)

func UploadFilesAndCopy(db *sql.DB, stageName, tableName string, files []string) error {
    // Create a context that will be used for all operations
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    // A channel that will hold the file paths we want to upload
    fileCh := make(chan string)

    // We'll use a WaitGroup to ensure all goroutines finish
    var wg sync.WaitGroup

    // Start a fixed number of worker goroutines to process the file uploads
    workerCount := 5
    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for file := range fileCh {
                putQuery := fmt.Sprintf(
                    "PUT file://%s @%s AUTO_COMPRESS=TRUE OVERWRITE=TRUE",
                    file, stageName,
                )
                if _, err := db.ExecContext(ctx, putQuery); err != nil {
                    log.Printf("failed to PUT file %q to stage %q: %v", file, stageName, err)
                }
            }
        }()
    }

    // Send file names into the channel for workers to pick up
    go func() {
        defer close(fileCh) // Close once we're done sending
        for _, file := range files {
            fileCh <- file
        }
    }()

    // Wait for all workers to finish uploading
    wg.Wait()

    // Now that all files are in the stage, run the COPY INTO command
    copyQuery := fmt.Sprintf(`COPY INTO %s FROM @%s FILE_FORMAT = (TYPE = PARQUET)`, tableName, stageName)
    if _, err := db.ExecContext(ctx, copyQuery); err != nil {
        return fmt.Errorf("failed to COPY INTO %s: %w", tableName, err)
    }

    return nil
}

We're experiencing issues with both 1.12.0 and 1.12.1. On 1.12.0, if context cancellation occurs, other running queries will fail with:

level=error msg="error: 000605: Identified SQL statement is not currently executing." func="gosnowflake.(*snowflakeConn).queryContextInternal" file="connection.go:410"

On 1.12.1 this goes away and we get the correct "context canceled" error message. However, we start experiencing non deterministic errors where the PUT commands will sometimes return an error error: 000605: Identified SQL statement is not currently executing. No context has been canceled in this case.

Going through the release notes and looking at the PRs for what changed, it seems like the #1248 may have introduced some kind of data race into the driver. We were able to get these errors to stop happening across our fleet by not sharing a context between the goroutines and creating a new child context for each spawned worker.

  1. What did you expect to see?

I expect the PUT queries to work concurrently as they did on 1.12.0 and the cancel context error message to reflect the behavior of 1.12.1.

  1. Can you set logging to DEBUG and collect the logs?

Not right now, this happens in production environments where its against our policy to collect these logs.

@niger-prequel niger-prequel added the bug Erroneous or unexpected behaviour label Jan 24, 2025
@github-actions github-actions bot changed the title Non deterministic errors with concurrent context aware queries in v1.12.1 SNOW-1896153: Non deterministic errors with concurrent context aware queries in v1.12.1 Jan 24, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Jan 27, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Jan 27, 2025
@sfc-gh-dszmolka
Copy link
Contributor

hi - thank you for letting us know about this issue and for the example. Will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Erroneous or unexpected behaviour status-triage Issue is under initial triage
Projects
None yet
Development

No branches or pull requests

2 participants