Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill in model/column descriptions when available in database metadata #160

Open
jenna-jordan opened this issue Feb 25, 2024 · 6 comments
Open
Labels
enhancement New feature or request triage

Comments

@jenna-jordan
Copy link

Describe the feature

Model and column descriptions can be stored in the database using the persist_docs feature. It is sometimes useful to "regenerate" the YAML files for models when tweaks have been made. However, the regenerated YAML has empty descriptions, meaning you need to manually copy and paste the old descriptions. If these descriptions have been stored in the database (persist_docs is set to true), then that could be used as the source to fill in already written descriptions.

Describe alternatives you've considered

Manually copy and paste descriptions from the old yaml into the new yaml.

Additional context

Different databases may store these descriptions differently, but you should be able to reference how persist_docs handles it.

Who will this benefit?

Anybody using codegen to automate the creation of source/model YAML files with descriptions already written.

Are you interested in contributing this feature?

I'm not currently able to

@jenna-jordan jenna-jordan added enhancement New feature or request triage labels Feb 25, 2024
@gwenwindflower
Copy link
Contributor

gwenwindflower commented Feb 29, 2024

Yep, makes sense, let me ask our Core DX embed how persist_docs works and see how feasible this is, but if it's relatively doable this totally makes sense, especially now that we're pulling descriptions from sources as well as of yesterday, it makes sense that sources+models should be able to pull from the db comments or metadata if it's not too gnarly to pull off.

@jgooly
Copy link

jgooly commented Mar 21, 2024

+1

In our case, we have several teams that consume Snowflake tables built by other teams (not using dbt) that have column descriptions as comments. We'd like to extract those comments as column descriptions when create the source yml files.

Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 18, 2024
@jenna-jordan
Copy link
Author

@dave-connors-3 any hope of this happening?

@github-actions github-actions bot removed the Stale label Sep 19, 2024
@Luiscri
Copy link

Luiscri commented Oct 7, 2024

This would be so helpful for us when defining sources from external projects with a huge number of columns. Actually, I started using codegen thinking this was already implemented, as it would save my a considerable amount of time

@dpguthrie
Copy link

I continue to get asked for this and came up with a hacky workaround below for the generate_source macro. I've only tested this on Snowflake though, so it's not yet adapter agnostic:

First, an internal macro that grabs the comment instead of calling directly adapter.get_columns_in_relation:

{% macro _internal_get_columns_in_relation(relation) -%}
  {%- set sql -%}
    describe table {{ relation }}
  {%- endset -%}
  {%- set result = run_query(sql) -%}

  {% set maximum = 10000 %}
  {% if (result | length) >= maximum %}
    {% set msg %}
      Too many columns in relation {{ relation }}! dbt can only get
      information about relations with fewer than {{ maximum }} columns.
    {% endset %}
    {% do exceptions.raise_compiler_error(msg) %}
  {% endif %}

  {% set columns = [] %}
  {% for row in result %}
    {% set comment = row['comment'] or '' %}
    {% do columns.append((api.Column.from_description(row['name'], row['type']), comment)) %}
  {% endfor %}
  {% do return(columns) %}
{% endmacro %}

Then, in the generate_source macro, the column metadata is retrieved via that internal macro and then the comment is included if include_descriptions is true.

{% macro generate_source(schema_name, database_name=target.database, generate_columns=False, include_descriptions=False, include_data_types=True, table_pattern='%', exclude='', name=schema_name, table_names=None, include_database=False, include_schema=False) %}

{% set sources_yaml=[] %}
{% do sources_yaml.append('version: 2') %}
{% do sources_yaml.append('') %}
{% do sources_yaml.append('sources:') %}
{% do sources_yaml.append('  - name: ' ~ name | lower) %}

{% if include_descriptions %}
    {% do sources_yaml.append('    description: ""' ) %}
{% endif %}

{% if database_name != target.database or include_database %}
{% do sources_yaml.append('    database: ' ~ database_name | lower) %}
{% endif %}

{% if schema_name != name or include_schema %}
{% do sources_yaml.append('    schema: ' ~ schema_name | lower) %}
{% endif %}

{% do sources_yaml.append('    tables:') %}

{% if table_names is none %}
{% set tables=codegen.get_tables_in_schema(schema_name, database_name, table_pattern, exclude) %}
{% else %}
{% set tables = table_names %}
{% endif %}

{% for table in tables %}
    {% do sources_yaml.append('      - name: ' ~ table | lower ) %}
    {% if include_descriptions %}
        {% do sources_yaml.append('        description: ""' ) %}
    {% endif %}
    {% if generate_columns %}
    {% do sources_yaml.append('        columns:') %}

        {% set table_relation=api.Relation.create(
            database=database_name,
            schema=schema_name,
            identifier=table
        ) %}

        {% set columns=_internal_get_columns_in_relation(table_relation) %}
        {% for column, comment in columns %}
            {% do sources_yaml.append('          - name: ' ~ column.name | lower ) %}
            {% if include_data_types %}
                {% do sources_yaml.append('            data_type: ' ~ codegen.data_type_format_source(column)) %}
            {% endif %}
            {% if include_descriptions %}
                {% do sources_yaml.append('            description: "' ~ comment ~ '"' ) %}
            {% endif %}
        {% endfor %}
            {% do sources_yaml.append('') %}

    {% endif %}

{% endfor %}

{% if execute %}

    {% set joined = sources_yaml | join ('\n') %}
    {{ log(joined, info=True) }}
    {% do return(joined) %}

{% endif %}

{% endmacro %}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

5 participants