You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.
When performing row-level hash validations, I run with the --verbose option to see the query that is generated and executed on the source and target. The query that gets executed performs the filter on the outermost part of he Query. Ideally, for better performance on BigQuery and many other DBs, the filter should be performed on the inner-most subquery.
Here is an example of the Query that gets executed for Row-level Hash comparison on BQ:
SELECT t0.hash__all, t0.pk_column
FROM (
SELECT *, TO_HEX(SHA256(concat__all)) AS hash__all
FROM (
SELECT *,
ARRAY_TO_STRING([upper__rstrip__ifnull__cast__col_1, upper__rstrip__ifnull__cast__col_2], '') AS concat__all
FROM (
SELECT *,
upper(rstrip__ifnull__cast__col_1) AS upper__rstrip__ifnull__cast__col_1,
upper(rstrip__ifnull__cast__col_2) AS upper__rstrip__ifnull__cast__col_2
FROM (
SELECT *,
rtrim(ifnull__cast__col_1) AS rstrip__ifnull__cast__col_1,
rtrim(ifnull__cast__col_2) AS rstrip__ifnull__cast__col_2
FROM (
SELECT *,
IFNULL(cast__col_1, 'DEFAULT_REPLACEMENT_STRING') AS ifnull__cast__col_1,
IFNULL(cast__col_2, 'DEFAULT_REPLACEMENT_STRING') AS ifnull__cast__col_2
FROM (
SELECT *, CAST(col_1 AS STRING) AS cast__col_1,
CAST(col_2 AS STRING) AS cast__col_2
FROM bq-project-id.datasets.table
) t6
) t5
) t4
) t3
) t2
) t0 WHERE DATE(transaction_date) = "2023-01-01"
This filter should be inside t6 in order to have the best performance. I have also tried to manually add a filter within within t6 in the YAML config file, but that does not seem to be possible.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When performing row-level hash validations, I run with the --verbose option to see the query that is generated and executed on the source and target. The query that gets executed performs the filter on the outermost part of he Query. Ideally, for better performance on BigQuery and many other DBs, the filter should be performed on the inner-most subquery.
Here is an example of the Query that gets executed for Row-level Hash comparison on BQ:
SELECT t0.hash__all, t0.pk_column
FROM (
SELECT *, TO_HEX(SHA256(concat__all)) AS hash__all
FROM (
SELECT *,
ARRAY_TO_STRING([upper__rstrip__ifnull__cast__col_1, upper__rstrip__ifnull__cast__col_2], '') AS concat__all
FROM (
SELECT *,
upper(rstrip__ifnull__cast__col_1) AS upper__rstrip__ifnull__cast__col_1,
upper(rstrip__ifnull__cast__col_2) AS upper__rstrip__ifnull__cast__col_2
FROM (
SELECT *,
rtrim(ifnull__cast__col_1) AS rstrip__ifnull__cast__col_1,
rtrim(ifnull__cast__col_2) AS rstrip__ifnull__cast__col_2
FROM (
SELECT *,
IFNULL(cast__col_1, 'DEFAULT_REPLACEMENT_STRING') AS ifnull__cast__col_1,
IFNULL(cast__col_2, 'DEFAULT_REPLACEMENT_STRING') AS ifnull__cast__col_2
FROM (
SELECT *, CAST(col_1 AS STRING) AS cast__col_1,
CAST(col_2 AS STRING) AS cast__col_2
FROM bq-project-id.datasets.table
) t6
) t5
) t4
) t3
) t2
) t0
WHERE DATE(transaction_date) = "2023-01-01"
This filter should be inside t6 in order to have the best performance. I have also tried to manually add a filter within within t6 in the YAML config file, but that does not seem to be possible.
The text was updated successfully, but these errors were encountered: