Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

chalmerlowe · 2024-09-11T12:17:57Z

This adds some additional functionality to properly handle partitioning of columns with the following datatypes.

DATE
TIMESTAMP
DATETIME
_PARTITIONDATE

Where appropriate, ensures the following functions can be used with/or without the following TimePartitioningTypes (HOUR, DAY, MONTH, YEAR).

DATE_TRUNC()
TIMESTAMP_TRUNC()
DATETIME_TRUNC()
DATE()

This is a nearly complete fix for #1072. The table from #1072 is included here:
NOTE: This PR does not handle _PARTITIONTIME

Column Data Type	HOUR	DAY	MONTH	YEAR
DATE	N/A	Fixed by #1057	via DATE_TRUNC	via DATE_TRUNC
DATETIME	Incorrectly implemented via DATE_TRUNC TODO: use DATETIME_TRUNC	Incorrectly implemented via DATE_TRUNC TODO: use DATETIME_TRUNC	Incorrectly implemented via DATE_TRUNC TODO: use DATETIME_TRUNC	Incorrectly implemented via DATE_TRUNC TODO: use DATETIME_TRUNC
TIMESTAMP	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC
_PARTITIONDATE	N/A	via DATE_TRUNC	via DATE_TRUNC	via DATE_TRUNC
_PARTITIONTIME	Not currently implemented TODO: USE TIMESTAMP_TRUNC	Not currently implemented TODO: USE TIMESTAMP_TRUNC	Not currently implemented TODO: USE TIMESTAMP_TRUNC	Not currently implemented TODO: USE TIMESTAMP_TRUNC

conventional-commit-lint-gcf · 2024-09-11T12:18:01Z

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

suzmue · 2024-09-13T17:31:10Z

sqlalchemy_bigquery/base.py

 field = "_PARTITIONDATE"
 trunc_fn = "DATE_TRUNC"

+ # Format used with _PARTITIONDATE which can only be used for
+ # DAY / MONTH / YEAR
+ if time_partitioning.field is None and field == "_PARTITIONDATE":


field == "_PARTITIONDATE" is always true

suzmue · 2024-09-17T22:24:09Z

sqlalchemy_bigquery/base.py

+ # DAY / MONTH / YEAR
+ if time_partitioning.field is None and field == "_PARTITIONDATE":
+ if time_partitioning.type_ in {"DAY", "MONTH", "YEAR"}:
+ return f"PARTITION BY {trunc_fn}({field})"


I'm a little confused, the type isn't passed to the trunc_fn, should it be?

suzmue · 2024-09-17T22:35:11Z

sqlalchemy_bigquery/base.py

 field = "_PARTITIONDATE"
 trunc_fn = "DATE_TRUNC"

+ # Format used with _PARTITIONDATE which can only be used for


There are a lot of cases here that make this function hard to parse. In particular, I think it could be improved if instead of the first special casing for time.partitioning.field is None block was instead combined by deleting that check, and moving the if time.partioning.type_ in ... below the if field is not None block:

field = "_PARTITIONDATE" trunc_fn = "DATE_TRUNC" if time_partitioning.field is not None: field = time_partitioning.field if isinstance( table.columns[field].type, (sqlalchemy.sql.sqltypes.TIMESTAMP), ): trunc_fn = "TIMESTAMP_TRUNC" elif isinstance( table.columns[field].type, sqlalchemy.sql.sqltypes.DATETIME, ): trunc_fn = "DATETIME_TRUNC" if trunc_fn == "DATE_TRUNC" and not time_partitioning.type_ in {"DAY", "MONTH", "YEAR"}: raise ValueError( f"DATE_TRUNC can only be used with TimePartitioningTypes {{DAY, MONTH, YEAR}} received {time_partitioning.type_}" ) # Format used with generically with DATE, TIMESTAMP, DATETIME, DATE_TRUNC return f"PARTITION BY {trunc_fn}({field}, {time_partitioning.type_})"**

I had removed the the last if isinstance(table.columns[field].type,, because I thought it was part of the if elif, I see now that its separate. If thats the case, it could be the first if condition so it can take priority instead of setting it back after the fact (pattern matching would be nice here when we support >= python 3.10)

adds additional functionality to cover more partitioning capability

42f9778

chalmerlowe assigned suzmue and Linchin Sep 11, 2024

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Sep 11, 2024

suzmue reviewed Sep 17, 2024

View reviewed changes

bnaul mentioned this pull request Sep 19, 2024

Time partitioning on Date column fails with type_="DAY" #1115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

chalmerlowe commented Sep 11, 2024

conventional-commit-lint-gcf bot commented Sep 11, 2024 •

edited

Loading

suzmue Sep 13, 2024

suzmue Sep 17, 2024

suzmue Sep 17, 2024

suzmue Sep 17, 2024

Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

Are you sure you want to change the base?

Feat: Update partitioning by DATE, DATETIME, TIMESTAMP, _PARTITIONDATE #1113

Conversation

chalmerlowe commented Sep 11, 2024

conventional-commit-lint-gcf bot commented Sep 11, 2024 • edited Loading

suzmue Sep 13, 2024

Choose a reason for hiding this comment

suzmue Sep 17, 2024

Choose a reason for hiding this comment

suzmue Sep 17, 2024

Choose a reason for hiding this comment

suzmue Sep 17, 2024

Choose a reason for hiding this comment

conventional-commit-lint-gcf bot commented Sep 11, 2024 •

edited

Loading