-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batch
context object to microbatch jinja context
#11031
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #11031 +/- ##
==========================================
- Coverage 89.12% 89.11% -0.02%
==========================================
Files 183 183
Lines 23767 23783 +16
==========================================
+ Hits 21183 21194 +11
- Misses 2584 2589 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
3e23199
to
f3748a8
Compare
By it being "runtime only" we mean that it doesn't exist on the artifact and thus won't be written out to the manifest artifact.
… microbatch batches
We were compiling the node for each batch _twice_. Besides making microbatch models more expensive than they needed to be, double compiling wasn't causing any issue. However the first compilation was happening _before_ we had added the batch context information to the model node for the batch. This was leading to models which try to access the `batch_context` information on the model to blow up, which was undesirable. As such, we've now gone and skipped the first compilation. We've done this similar to how SavedQuery nodes skip compilation.
…ict shape This is weird, but necessary, I apologize. Mashumaro handles the dictification of this class via a compile time generated `to_dict` method based off of the _typing_ of th class. By default `datetime` types are converted to strings. We don't want that, we want them to stay datetimes.
f3748a8
to
c27c494
Compare
In 45daec7 we stopped an extra compilation that was happening per batch prior to the batch_context being loaded. Stopping this extra compilation means that compiled sql for the microbatch model without the event time filter / batch context is no longer produced. We have discussed this and _believe_ it is okay given that this is a new node type that has not hit GA yet.
The name `build_batch_context` was confusing as 1) We have a `BatchContext` object, which the method was not building 2) The method builds the jinja context for the batch As such it felt appropriate to rename the method to more accurately communicate what it does.
…_jinja_context_macro_sql` This rename was to make it more clear that the jinja context for a batch was being checked, as a batch_context has a slightly different connotation.
batch_id
to microbatch jinja contextbatch
context object to microbatch jinja context
16 commits for |
Kicked off dbt-snowflake tests, since Snowflake's microbatch implementation depends on the now-legacy (but still supported) model.config attributes: https://github.com/dbt-labs/dbt-snowflake/actions/runs/12057315923 update: all green ✅ |
* Add `batch_id` to jinja context of microbatch batches * Add changie doc * Update `format_batch_start` to assume `batch_start` is always provided * Add "runtime only" property `batch_context` to `ModelNode` By it being "runtime only" we mean that it doesn't exist on the artifact and thus won't be written out to the manifest artifact. * Begin populating `batch_context` during materialization execution for microbatch batches * Fix circular import * Fixup MicrobatchBuilder.batch_id property method * Ensure MicrobatchModelRunner doesn't double compile batches We were compiling the node for each batch _twice_. Besides making microbatch models more expensive than they needed to be, double compiling wasn't causing any issue. However the first compilation was happening _before_ we had added the batch context information to the model node for the batch. This was leading to models which try to access the `batch_context` information on the model to blow up, which was undesirable. As such, we've now gone and skipped the first compilation. We've done this similar to how SavedQuery nodes skip compilation. * Add `__post_serialize__` method to `BatchContext` to ensure correct dict shape This is weird, but necessary, I apologize. Mashumaro handles the dictification of this class via a compile time generated `to_dict` method based off of the _typing_ of th class. By default `datetime` types are converted to strings. We don't want that, we want them to stay datetimes. * Update tests to check for `batch_context` * Update `resolve_event_time_filter` to use new `batch_context` * Stop testing for batchless compiled code for microbatch models In 45daec7 we stopped an extra compilation that was happening per batch prior to the batch_context being loaded. Stopping this extra compilation means that compiled sql for the microbatch model without the event time filter / batch context is no longer produced. We have discussed this and _believe_ it is okay given that this is a new node type that has not hit GA yet. * Rename `ModelNode.batch_context` to `ModelNode.batch` * Rename `build_batch_context` to `build_jinja_context_for_batch` The name `build_batch_context` was confusing as 1) We have a `BatchContext` object, which the method was not building 2) The method builds the jinja context for the batch As such it felt appropriate to rename the method to more accurately communicate what it does. * Rename test macro `invalid_batch_context_macro_sql` to `invalid_batch_jinja_context_macro_sql` This rename was to make it more clear that the jinja context for a batch was being checked, as a batch_context has a slightly different connotation. * Update changie doc (cherry picked from commit c3d87b8)
* Add `batch_id` to jinja context of microbatch batches * Add changie doc * Update `format_batch_start` to assume `batch_start` is always provided * Add "runtime only" property `batch_context` to `ModelNode` By it being "runtime only" we mean that it doesn't exist on the artifact and thus won't be written out to the manifest artifact. * Begin populating `batch_context` during materialization execution for microbatch batches * Fix circular import * Fixup MicrobatchBuilder.batch_id property method * Ensure MicrobatchModelRunner doesn't double compile batches We were compiling the node for each batch _twice_. Besides making microbatch models more expensive than they needed to be, double compiling wasn't causing any issue. However the first compilation was happening _before_ we had added the batch context information to the model node for the batch. This was leading to models which try to access the `batch_context` information on the model to blow up, which was undesirable. As such, we've now gone and skipped the first compilation. We've done this similar to how SavedQuery nodes skip compilation. * Add `__post_serialize__` method to `BatchContext` to ensure correct dict shape This is weird, but necessary, I apologize. Mashumaro handles the dictification of this class via a compile time generated `to_dict` method based off of the _typing_ of th class. By default `datetime` types are converted to strings. We don't want that, we want them to stay datetimes. * Update tests to check for `batch_context` * Update `resolve_event_time_filter` to use new `batch_context` * Stop testing for batchless compiled code for microbatch models In 45daec7 we stopped an extra compilation that was happening per batch prior to the batch_context being loaded. Stopping this extra compilation means that compiled sql for the microbatch model without the event time filter / batch context is no longer produced. We have discussed this and _believe_ it is okay given that this is a new node type that has not hit GA yet. * Rename `ModelNode.batch_context` to `ModelNode.batch` * Rename `build_batch_context` to `build_jinja_context_for_batch` The name `build_batch_context` was confusing as 1) We have a `BatchContext` object, which the method was not building 2) The method builds the jinja context for the batch As such it felt appropriate to rename the method to more accurately communicate what it does. * Rename test macro `invalid_batch_context_macro_sql` to `invalid_batch_jinja_context_macro_sql` This rename was to make it more clear that the jinja context for a batch was being checked, as a batch_context has a slightly different connotation. * Update changie doc (cherry picked from commit c3d87b8) Co-authored-by: Quigley Malcolm <[email protected]>
Resolves #11025
Problem
We need to get an
id
for a batch into the jinja context for batches of modelsSolution
Add a
BatchContext
object under thebatch
key for aModelNodel
. TheBatchContext
contains anid
,event_time_start
, andevent_time_end
. Theid
field sets out the original problem we set out to solve. Theevent_time_start
andevent_time_end
, formalize and supercede what we previously used__dbt_internal_event_time_start
and__dbt_internal_event_time_end
As part of this work, we also stopped double compiling each batch. Previously we were
(1) Compiling for the batch prior to instantiating the jinja context for the batch
(2) Compiling for the batch after instantiating the jinja context for the batch
We have stopped doing (1). As a side effect of this, the compiled
.sql
file for the microbatch model that had no event time filtering will no longer be outputted. Each batch specific compiled.sql
file will still be produced. This should be safe asmicrobatch
is a new kind of model node being added in 1.9 and 1.9 GA has not been released yet. As such, nobody should be depending on its existence. If it does prove to be problematic, we've already spiked out some work on how to bring it back.Checklist