Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build from fct table #153

Open
dgitis opened this issue Mar 14, 2023 · 2 comments
Open

Build from fct table #153

dgitis opened this issue Mar 14, 2023 · 2 comments

Comments

@dgitis
Copy link
Collaborator

dgitis commented Mar 14, 2023

While running basically the same model in parallel to smooth the transition between models, I noticed that building the models from a fct_ga4__event_page_view model that I created was 40 times more efficient than from stg_ga4__event_page_view.

build-stg-vs-fct

We build from stg_ga4__event_page_view in a number of places which are all opportunities for a performance improvement. This doesn't take into account caching, but I haven't noticed much of a cache effect from model to model (cache clearly helps when re-using fields within a single model).

However, I only have anecdotal evidence of the little cache re-use between models, so I'd like to see evidence in either respect.

I see us having two options here:

  1. We make fct_ga4__event_page_view a core package model
  2. We conditionally build from fct_ga4__event_page_view if it is present; otherwise falling back to stg_ga4__event_page_view

The first option seems simpler. It has the advantage that enabling and disabling package models is a pattern that we've used elsewhere, so it's not a big leap to expect users to disable the fct_ga4__event_page_view model. However, I find myself customizing this model a lot for each client so we'd be creating a model that almost always gets disabled.

The second option requires greater complexity in the package models. I suspect many people will create fct_ga4__event_page_view models that don't error without consulting any documentation and the errors messages should make it fairly clear what is missing from any fct_ga4__event_page_view model created by users.

Is this something that we should pursue?

Is caching more effective than I give it credit?

Do you have any preferences for either of these methods?

@willbryant
Copy link
Contributor

I think that our perspective on a lot of these things will change when materialized view support finally lands in dbt-bigquery (currently in the experimental plugin, but now being looked at for the core support plugin).

Right now we are kind of hamstrung by the half-implemented idea of supporting streaming data, right?

@adamribaudo-velir
Copy link
Collaborator

@dgitis sorry, I'm not following. Can you elaborate on your design of fct_ga4__event_page_view? Is it just a table materialization of stg_ga4__event_page_view?

And what was the SQL you were running where you saw an improvement? Without seeing that, it's hard to tell whether the improvement was due to the SQL itself (use of select *, etc) vs. the source of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants