Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trino ingestion with profiling enabled is not working as expected with table filter enabled #11792

Open
anilreddygollapalli opened this issue Nov 5, 2024 · 3 comments
Labels
bug Bug report

Comments

@anilreddygollapalli
Copy link

Describe the bug
When running trino ingestion with specific table name mentioned under filter --> tables --> table name(fully qualified), pipeline is not considering the filter specified.

To Reproduce
Steps to reproduce the behavior:

  1. Create new trino ingestion pipeline with some profiling parameters
  2. specify schema and table name under filter
  3. Run the ingestion pipeline.
  4. When you check the ingestion logs, you will not be able to see the table name under where condition while pulling profiling data.

Expected behavior
it should be able to query against the table specified under filter section.

Screenshots
NA

Desktop (please complete the following information):

  • OS: Windows
  • Browser: chrome
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@anilreddygollapalli anilreddygollapalli added the bug Bug report label Nov 5, 2024
@jjoyce0510
Copy link
Collaborator

Hi there - we use the profilingPattern to include and exclude assets for profiling..

Can you confirm you are using this field?

Screenshot 2024-11-05 at 12 31 14 PM

@anilreddygollapalli
Copy link
Author

we will test this and let you know.

@anilreddygollapalli
Copy link
Author

anilreddygollapalli commented Nov 13, 2024

Hello Tamas,

Even after adding the below parameters to filter a specific table, it is not taking into consideration.

Reference : https://datahubproject.io/docs/generated/ingestion/sources/trino

    profile_pattern:
        allow:
            - test_tbl
    schema_pattern:
        allow:
            - test_db
    table_pattern:
        allow:
            - test_tbl
    profiling:
        enabled: true
        profile_table_level_only: false
        include_field_distinct_count: true
        include_field_distinct_value_frequencies: true
        include_field_histogram: true
        include_field_max_value: true
        include_field_mean_value: true
        include_field_median_value: true
        include_field_min_value: true
        include_field_null_count: true
        include_field_quantiles: true
        include_field_sample_values: true
        include_field_stddev_value: true

this record found in the ingestion logs and it showing as view_pattern instead of table_pattern, but when we checked the analytics tab, we dont see that table ingested with profiling info.

[2024-11-13 17:12:27,452] INFO {datahub.ingestion.source.sql.sql_config:113} - Applying table_pattern {'allow': ['test_tbl']} to view_pattern.

Please look into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants