-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/teradata): Teradata speed up changes #9059
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this out so fast. I'd like to hold off on merging this because it's a bit hacky -- perhaps we should refactor our SQL common connector into overridable methods instead. But great to have this
setattr( # noqa: B010 | ||
inspector, | ||
"get_table_names", | ||
lambda schema: [ | ||
i.name | ||
for i in filter( | ||
lambda t: t.object_type != "View", self._tables_cache[schema] | ||
) | ||
], | ||
) | ||
yield from super().loop_tables(inspector, schema, sql_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems kinda hacky lol, do we need to make our SQL common source more extensible?
WHERE DatabaseName NOT IN ('All', 'Crashdumps', 'DBC', 'dbcmngr', | ||
'Default', 'External_AP', 'EXTUSER', 'LockLogShredder', 'PUBLIC', | ||
'Sys_Calendar', 'SysAdmin', 'SYSBAR', 'SYSJDBC', 'SYSLIB', | ||
'SystemFe', 'SYSUDTLIB', 'SYSUIF', 'TD_SERVER_DB', 'TDStats', | ||
'TD_SYSGPL', 'TD_SYSXML', 'TDMaps', 'TDPUSER', 'TDQCD', | ||
'tdwm', 'SQLJ', 'TD_SYSFNLIB', 'SYSSPATIAL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to reuse this list
|
||
use_cached_metadata: bool = Field( | ||
default=True, | ||
description="Whether to use cached metadata. This reduce the number of queries to the database but requires to have cached metadata.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description="Whether to use cached metadata. This reduce the number of queries to the database but requires to have cached metadata.", | |
description="Whether to use cached metadata. This reduces the number of queries to the database but requires storing all tables in memory.", |
# self.loop_tables = self.cached_loop_tables | ||
# self.loop_views = self.cached_loop_views | ||
# self.get_table_properties = self.cached_get_table_properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# self.loop_tables = self.cached_loop_tables | |
# self.loop_views = self.cached_loop_views | |
# self.get_table_properties = self.cached_get_table_properties |
# self._get_columns = lambda dataset_name, inspector, schema, table: [] | ||
# self._get_foreign_keys = lambda dataset_name, inspector, schema, table: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# self._get_columns = lambda dataset_name, inspector, schema, table: [] | |
# self._get_foreign_keys = lambda dataset_name, inspector, schema, table: [] |
# url = self.config.get_sql_alchemy_url(current_db=db) | ||
# with create_engine(url, **self.config.options).connect() as conn: | ||
# inspector = inspect(conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# url = self.config.get_sql_alchemy_url(current_db=db) | |
# with create_engine(url, **self.config.options).connect() as conn: | |
# inspector = inspect(conn) |
self.report.num_queries_parsed += 1 | ||
if self.report.num_queries_parsed % 1000 == 0: | ||
logger.info(f"Parsed {self.report.num_queries_parsed} queries") | ||
|
||
print(entry.query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You left this print here, I don't know if it was intentional
0b2ce01
to
ef454ee
Compare
Fixing empty lineage
ef454ee
to
4fb4d2a
Compare
|
||
if self.config.use_cached_metadata: | ||
if self.config.include_tables or self.config.include_views: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not make any of these queries if include_tables
and include_views
are false?
) -> Iterable[MetadataWorkUnit]: | ||
result = sqlglot_lineage( | ||
sql=query, | ||
# With this clever hack we can make the query parser to not fail on queries with CASESPECIFIC | ||
sql=query.replace("(NOT CASESPECIFIC)", ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have a whole host of query parsing hacks to add, somewhere, lol
Checklist