Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1771421: Rollback transaction doesn't work if transaction is opened with a context #1233

Open
nearsyh opened this issue Oct 29, 2024 · 8 comments
Assignees
Labels
bug Erroneous or unexpected behaviour status-blocked Progressing the issue is blocked by something, probably dependencies. status-triage_done Initial triage done, will be further handled by the driver team

Comments

@nearsyh
Copy link

nearsyh commented Oct 29, 2024

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of GO driver are you using?
    v1.9.0

  2. What operating system and processor architecture are you using?
    MacOS 15.0.1 (24A348), M2 Max

  3. What version of GO are you using?
    go version go1.23.2 darwin/arm64

4.Server version:* E.g. 1.90.1
8.40.1

  1. What did you do?
func Test_TransactionIsNotRollbackWithContextCancelled(t *testing.T) {
        sf := OpenSnowflakeDB()

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	txIDCh := make(chan int, 1)
	cancelledCh := make(chan struct{}, 1)
	doneCh := make(chan struct{}, 1)

	go func() {
		tx, err := sf.BeginTx(ctx, nil) // Using sf.Begin() is fine
		if err != nil {
			t.Fatal(err)
		}
		defer func() {
			err := tx.Rollback()
			if err != nil {
				fmt.Printf("Rollback failed: %v\n", err)
			}
			fmt.Printf("transaction is rollbacked\n")
			doneCh <- struct{}{}
		}()
		var txID int
		if err := tx.QueryRowContext(ctx, "SELECT CURRENT_TRANSACTION();").Scan(&txID); err != nil {
			t.Fatal(err)
		}
		txIDCh <- txID
		<-cancelledCh
	}()

	txID := <-txIDCh
	cancel()
	cancelledCh <- struct{}{}
	<-doneCh

	if runningTxsCount(t, sf.DB) != 0 { // Use "SHOW TRANSACTIONS;"
		t.Error("transaction is still running")
		_, err := sf.Exec("SELECT SYSTEM$ABORT_TRANSACTION(?);", txID)
		if err != nil {
			t.Fatal(err)
		}
	}
}
  1. What did you expect to see?

    The transaction should be aborted.

@nearsyh nearsyh added the bug Erroneous or unexpected behaviour label Oct 29, 2024
@github-actions github-actions bot changed the title Rollback transaction doesn't work if transaction is opened with a context SNOW-1771421: Rollback transaction doesn't work if transaction is opened with a context Oct 29, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Oct 30, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Oct 30, 2024
@sfc-gh-dszmolka
Copy link
Contributor

thank you for raising this and for the details, i'll take a look int the coming days. appreciate sharing the reproduction approach!

@sfc-gh-dszmolka
Copy link
Contributor

i tried to figure out what's in OpenSnowflakeDB and runningTxsCount :) and to create a standalone runnable program as reproduction. Ended up with:

package main

import (
  "context"
  "database/sql"
  "fmt"
  "log"
  "strconv"

  sf "github.com/snowflakedb/gosnowflake"
)

func runningTxsCount(conn *sql.Conn, ctx context.Context) int {
  rows, err := conn.QueryContext(ctx, "SHOW /* gosnowflake 1233 */ TRANSACTIONS;")
  if err != nil {
     log.Fatal("error in runningTxsCount: ", err)
  }
  defer rows.Close()

  counter := 0
  for rows.Next() {
    counter++
  }
  return counter
}

func main() {

  cfg, err := sf.GetConfigFromEnv([]*sf.ConfigParam{
    {Name: "Account", EnvName: "SNOWFLAKE_TEST_ACCOUNT", FailOnMissing: true},
    {Name: "User", EnvName: "SNOWFLAKE_TEST_USER", FailOnMissing: true},
    {Name: "Password", EnvName: "SNOWFLAKE_TEST_PASSWORD", FailOnMissing: true},
  })
  if err != nil {
    log.Fatalf("failed to create Config, err: %v", err)
  }
  //cfg.Tracing = "TRACE"
  dsn, err := sf.DSN(cfg)
  if err != nil {
    log.Fatalf("failed to create DSN from Config: %v, err: %v", cfg, err)
  }

  db, err := sql.Open("snowflake", dsn)
  if err != nil {
    log.Fatalf("failed to connect. %v, err: %v", dsn, err)
  }
  defer db.Close()

  ctx, cancel := context.WithCancel(context.Background())
  defer cancel()

  conn, err := db.Conn(ctx)
  if err != nil {
    log.Fatalf("Failed to acquire connection. err: %v", err)
  }
  defer conn.Close()

  txIDCh := make(chan int, 1)
  cancelledCh := make(chan struct{}, 1)
  doneCh := make(chan struct{}, 1)

  go func() {
    tx, err := conn.BeginTx(ctx, nil) // Using sf.Begin() is fine
    if err != nil {
      log.Fatal(err)
    }
    defer func() {
      err := tx.Rollback()
      if err != nil {
        fmt.Printf("Rollback failed: %v\n", err)
      }
      fmt.Printf("transaction is rollbacked\n")
      doneCh <- struct{}{}
    }()
    var txID int
    if err := tx.QueryRowContext(ctx, "SELECT /* gosnowflake 1233 */ CURRENT_TRANSACTION();").Scan(&txID); err != nil {
      log.Fatal(err)
    }
    log.Print("--> txID in goroutine: " + strconv.Itoa(txID))
    txIDCh <- txID
    <-cancelledCh
  }()

  txID := <-txIDCh
  log.Print("--> txID outside goroutine: " + strconv.Itoa(txID))
  cancel()
  cancelledCh <- struct{}{}
  <-doneCh

    if runningTxsCount(conn, ctx) != 0 { // Use "SHOW TRANSACTIONS;"
      log.Print("transaction is still running")
      _, err := conn.ExecContext(ctx, "SELECT /* gosnowflake 1233 */ SYSTEM$ABORT_TRANSACTION(?);", txID)
      if err != nil {
        log.Fatal(err)
      }
    }
}

using the latest gosnowflake of 1.12.0
Result when running the program:

# go run path/to.main.go
2024/10/31 08:16:34 --> txID in goroutine: 1730362593932000000
2024/10/31 08:16:34 --> txID outside goroutine: 1730362593932000000
Rollback failed: 000605: Identified SQL statement is not currently executing.
transaction is rollbacked
ERRO[0002]connection.go:410 gosnowflake.(*snowflakeConn).queryContextInternal error: 000605: Identified SQL statement is not currently executing. 
2024/10/31 08:16:35 error in runningTxsCount: 000605: Identified SQL statement is not currently executing.
exit status 1

in Query History in Snowflake, only 2 statements are seen, in reverse order:

SELECT /* gosnowflake 1233 */ CURRENT_TRANSACTION();
BEGIN

transaction is rollbacked and SHOW TRANSACTIONS not even executed suggests to me that the context is canceled (thus the transaction too) and the code never gets so far.

It is entirely possible of course, that I'm doing something wrong, so would it be please possible for you to

  • try to reproduce the issue on 1.12.0
  • if it reproduces on 1.12.0, could you please send a runnable full, minimal reproduction which when executed, exhibits th issue you're seeing? Thank you in advance !

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-information_needed Additional information is required from the reporter label Oct 31, 2024
@nearsyh
Copy link
Author

nearsyh commented Nov 2, 2024

transaction is rollbacked and SHOW TRANSACTIONS not even executed suggests to me that the context is canceled (thus the transaction too) and the code never gets so far.

You are right, the context is cancelled. That's why runningTxsCount should not use the same context to get the number of running transactions.

The problem I met here is that, if the context is cancelled, the transaction is still in running status in Snowflake. Locks hold by this transaction won't be released, and other queries requiring this lock will be blocked. What I expect is that, if the context used by BeginTx is cancelled, the transaction should not be included in SHOW TRANSACTIONS; output.

package main

import (
	"context"
	"database/sql"
	"fmt"
	"log"

	sf "github.com/snowflakedb/gosnowflake"
)

func main() {
	cfg, err := sf.GetConfigFromEnv([]*sf.ConfigParam{
		{Name: "Account", EnvName: "SNOWFLAKE_TEST_ACCOUNT", FailOnMissing: true},
		{Name: "User", EnvName: "SNOWFLAKE_TEST_USER", FailOnMissing: true},
		{Name: "Password", EnvName: "SNOWFLAKE_TEST_PASSWORD", FailOnMissing: true},
	})
	if err != nil {
		log.Fatalf("failed to create Config, err: %v", err)
	}
	//cfg.Tracing = "TRACE"
	dsn, err := sf.DSN(cfg)
	if err != nil {
		log.Fatalf("failed to create DSN from Config: %v, err: %v", cfg, err)
	}

	db, err := sql.Open("snowflake", dsn)
	if err != nil {
		log.Fatalf("%v\n", err)
	}
	defer db.Close()

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	txIDCh := make(chan int, 1)
	cancelledCh := make(chan struct{}, 1)
	doneCh := make(chan struct{}, 1)

	go func() {
		tx, err := db.BeginTx(ctx, nil) // Using db.Begin() is fine
		if err != nil {
			log.Fatal(err)
		}
		defer func() {
			err := tx.Rollback()
			if err != nil {
				fmt.Printf("Rollback failed: %v\n", err)
			}
			fmt.Printf("transaction is rollbacked\n")
			doneCh <- struct{}{}
		}()
		var txID int
		if err := tx.QueryRowContext(ctx, "SELECT CURRENT_TRANSACTION();").Scan(&txID); err != nil {
			log.Fatal(err)
		}
		txIDCh <- txID
		<-cancelledCh
	}()

	txID := <-txIDCh
	cancel()
	cancelledCh <- struct{}{}
	<-doneCh

	if runningTxsCount(db) != 0 { // Use "SHOW TRANSACTIONS;"
		log.Print("transaction is still running")
		_, err := db.Exec("SELECT SYSTEM$ABORT_TRANSACTION(?);", txID)
		if err != nil {
			log.Fatal(err)
		}
	}
}

func runningTxsCount(db *sql.DB) int {
	rows, err := db.Query("SHOW TRANSACTIONS;")
	if err != nil {
		log.Fatal("error in runningTxsCount: ", err)
	}
	defer rows.Close()

	counter := 0
	for rows.Next() {
		counter++
	}
	return counter
}

@sfc-gh-dszmolka
Copy link
Contributor

thank you for the additional info ! we'll take a look at this issue.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter labels Nov 4, 2024
@sfc-gh-pfus
Copy link
Collaborator

sfc-gh-pfus commented Nov 4, 2024

Hello! I took a look and my findings are:

  1. You're code snippet seems to be mostly right and your expectations are valid.
  2. Having that said, when cancel is invoked, it ends up calling abort-request to Snowflake backend. It is the backend that should be responsible for rolling back a transaction during query abort, not the driver.

We will create an issue for one of our backend teams to take a look at this topic.

@nearsyh
Copy link
Author

nearsyh commented Nov 5, 2024

Thanks for the update. My guess is that the driver is reusing the already-cancelled context to send the abort request, and thus the request is not sent.

@sfc-gh-pfus
Copy link
Collaborator

Actually the abort request is sent correctly (source), but it doesn't rollback transaction on the backend side.

@sfc-gh-pfus
Copy link
Collaborator

Giving it another thought, we agreed that it's a bit more complicated. We see a case in which cancelling context should rollback transaction, but:

  1. It would be a breaking change, so even if we implement anything it would be hidden behind additional context or something, not enabled by default.
  2. Implementing it requires additional information returned from the backend, so it takes longer.

@sfc-gh-dszmolka sfc-gh-dszmolka added blocked Progress cannot be made to this issue due to an outside blocking factor. status-blocked Progressing the issue is blocked by something, probably dependencies. and removed blocked Progress cannot be made to this issue due to an outside blocking factor. labels Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Erroneous or unexpected behaviour status-blocked Progressing the issue is blocked by something, probably dependencies. status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants