Skip to content
This repository has been archived by the owner on Dec 18, 2023. It is now read-only.

DBT-presto Glue support #41

Open
oleksandrkovalenko opened this issue May 14, 2021 · 2 comments
Open

DBT-presto Glue support #41

oleksandrkovalenko opened this issue May 14, 2021 · 2 comments

Comments

@oleksandrkovalenko
Copy link

I have a hive DBT project configured to use hive via presto. We are using AWS EMR and AWS Glue Catalogue. I have added recommended configuration for presto

hive.metastore-cache-ttl=0s hive.metastore-refresh-interval = 5s hive.allow-drop-table=true hive.allow-rename-table=true

When I'm running dbt run I'm getting PrestoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Table rename is not yet supported by Glue service")

Is there a way to configure dbt or dbt-presto to run different queries instead of renaming tables?

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 25, 2021

@oleksandrkovalenko This is really interesting. It is possible to reimplement/override the basic materializations, by copying-pasting-editing from dbt-presto into your own project. In particular, this is the offending bit of logic:

https://github.com/fishtown-analytics/dbt-presto/blob/81efb93ac809cd4874713825833cf63c6350e94e/dbt/include/presto/macros/materializations/table.sql#L41-L43

There's a larger question here that goes beyond table rename. We've always known that Presto's functionality varies tremendously based on the connector being used: transactions, atomic DML, metadata availability, etc. How should we think about structuring a dbt plugin for Presto, given the functional variance? Does it make sense to have a wide array of plugins, each for use with a different flavor of Presto/Trino/etc?

@friendofasquid
Copy link

We have had success reimplementing the table materialisation in our project:

{% materialization table, adapter='presto' -%}  {%- set identifier = model['alias'] -%}
  {%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}  {%- set target_relation = api.Relation.create(identifier=identifier,                                                schema=schema,                                                database=database,                                                type='table') -%}
  {{ run_hooks(pre_hooks) }}
  {%- if old_relation is not none -%}      {{ adapter.drop_relation(old_relation) }}  {%- endif -%}
  -- build model  {% call statement('main') -%}    {{ create_table_as(False, target_relation, sql) }}  {% endcall -%}
  {{ run_hooks(post_hooks) }}
  {% do persist_docs(target_relation, model) %}
  {{ return({'relations': [target_relation]}) }}
{%- endmaterialization -%}

IIRC, we took this from the dbt-athena connector. Works fine, except for the downtime. We'll be looking to fix that with some view gymnastics in the next month or so.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants