Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

Inefficient Filter logic in Row-level Hash validations #154

Open
wt1309 opened this issue Mar 12, 2023 · 0 comments
Open

Inefficient Filter logic in Row-level Hash validations #154

wt1309 opened this issue Mar 12, 2023 · 0 comments

Comments

@wt1309
Copy link

wt1309 commented Mar 12, 2023

When performing row-level hash validations, I run with the --verbose option to see the query that is generated and executed on the source and target. The query that gets executed performs the filter on the outermost part of he Query. Ideally, for better performance on BigQuery and many other DBs, the filter should be performed on the inner-most subquery.

Here is an example of the Query that gets executed for Row-level Hash comparison on BQ:

SELECT t0.hash__all, t0.pk_column
FROM (
 SELECT *, TO_HEX(SHA256(concat__all)) AS hash__all
 FROM (
  SELECT *,
   ARRAY_TO_STRING([upper__rstrip__ifnull__cast__col_1, upper__rstrip__ifnull__cast__col_2], '') AS concat__all
  FROM (
   SELECT *,
    upper(rstrip__ifnull__cast__col_1) AS upper__rstrip__ifnull__cast__col_1,
    upper(rstrip__ifnull__cast__col_2) AS upper__rstrip__ifnull__cast__col_2
   FROM (
    SELECT *,
     rtrim(ifnull__cast__col_1) AS rstrip__ifnull__cast__col_1,
     rtrim(ifnull__cast__col_2) AS rstrip__ifnull__cast__col_2
    FROM (
     SELECT *,
      IFNULL(cast__col_1, 'DEFAULT_REPLACEMENT_STRING') AS ifnull__cast__col_1,
       IFNULL(cast__col_2, 'DEFAULT_REPLACEMENT_STRING') AS ifnull__cast__col_2
     FROM (
      SELECT *, CAST(col_1 AS STRING) AS cast__col_1,
        CAST(col_2 AS STRING) AS cast__col_2
      FROM bq-project-id.datasets.table
     ) t6
    ) t5
    ) t4
  ) t3
 ) t2
) t0
WHERE DATE(transaction_date) = "2023-01-01"

This filter should be inside t6 in order to have the best performance. I have also tried to manually add a filter within within t6 in the YAML config file, but that does not seem to be possible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant