Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metadata-ingestion][dbt] DBTColumn dataclass does not ensure data types #11825

Open
igorvoltaic opened this issue Nov 10, 2024 · 0 comments
Open

Comments

@igorvoltaic
Copy link


@dataclass does not ensure data types

def resolve_trino_modified_type(type_string: str) -> Any:

this results in regex failure in resolve_trino_modified_type

[2024-10-26 08:00:09,013] ERROR    {datahub.entrypoints:205} - Command failed: expected string or bytes-like object
Traceback (most recent call last):
  File "/tmp/site-packages/datahub/entrypoints.py", line 192, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/tmp/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/tmp/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
    raise e
  File "/tmp/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
    res = func(*args, **kwargs)
  File "/tmp/site-packages/datahub/cli/ingest_cli.py", line 201, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/tmp/site-packages/datahub/cli/ingest_cli.py", line 185, in run_ingestion_and_check_upgrade
    ret = await ingestion_future
  File "/tmp/site-packages/datahub/cli/ingest_cli.py", line 139, in run_pipeline_to_completion
    raise e
  File "/tmp/site-packages/datahub/cli/ingest_cli.py", line 131, in run_pipeline_to_completion
    pipeline.run()
  File "/tmp/site-packages/datahub/ingestion/run/pipeline.py", line 407, in run
    for wu in itertools.islice(
  File "/tmp/site-packages/datahub/ingestion/api/source_helpers.py", line 160, in auto_stale_entity_removal
    for wu in stream:
  File "/tmp/site-packages/datahub/ingestion/api/source_helpers.py", line 184, in auto_workunit_reporter
    for wu in stream:
  File "/tmp/site-packages/datahub/ingestion/api/source_helpers.py", line 277, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/tmp/site-packages/datahub/ingestion/api/source_helpers.py", line 444, in _batch_workunits_by_urn
    for wu in stream:
  File "/tmp/site-packages/datahub/ingestion/api/source_helpers.py", line 201, in auto_materialize_referenced_tags_terms
    for wu in stream:
  File "/tmp/site-packages/datahub/ingestion/api/source_helpers.py", line 104, in auto_status_aspect
    for wu in stream:
  File "/tmp/site-packages/datahub/ingestion/source/dbt/dbt_common.py", line 970, in get_workunits_internal
    yield from self.create_dbt_platform_mces(
  File "/tmp/site-packages/datahub/ingestion/source/dbt/dbt_common.py", line 1254, in create_dbt_platform_mces
    aspects = self._generate_base_dbt_aspects(
  File "/tmp/site-packages/datahub/ingestion/source/dbt/dbt_common.py", line 1563, in _generate_base_dbt_aspects
    schema_metadata = self.get_schema_metadata(self.report, node, mce_platform)
  File "/tmp/site-packages/datahub/ingestion/source/dbt/dbt_common.py", line 1626, in get_schema_metadata
    or get_column_type(
  File "/tmp/site-packages/datahub/ingestion/source/dbt/dbt_common.py", line 808, in get_column_type
    TypeClass = resolve_trino_modified_type(column_type)
  File "/tmp/site-packages/datahub/ingestion/source/sql/sql_types.py", line 232, in resolve_trino_modified_type
    match = re.match(r"([a-zA-Z]+)\(.+\)", type_string)
  File "/usr/local/lib/python3.10/re.py", line 190, in match
    return _compile(pattern, flags).match(string)
TypeError: expected string or bytes-like object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant