Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store code migration progress in multiworkspace history log and visualize in dashboard #3112

Draft
wants to merge 86 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
f4fe2be
Assign TODO to JCZuurmond
JCZuurmond Nov 1, 2024
ecda74d
Write out direct filesystem access
JCZuurmond Nov 1, 2024
bf03110
Add id attributes to DirectFsAccess
JCZuurmond Nov 1, 2024
f20a5c0
Add TODO about assessment attributes on SourceInfo
JCZuurmond Nov 1, 2024
54fa23a
Add TODO about source timestamp
JCZuurmond Nov 1, 2024
e7f5cc1
Add test for direct fs access history encoder
JCZuurmond Nov 1, 2024
84b6285
Add DirectFsAccessProgressEncoder
JCZuurmond Nov 1, 2024
ae4162e
Format
JCZuurmond Nov 1, 2024
126e08e
Add TODO on moving DirectFsAccess
JCZuurmond Nov 1, 2024
6ba60b5
Remove unused import
JCZuurmond Nov 1, 2024
5dc0af3
Add dfsas fixture
JCZuurmond Nov 1, 2024
5efd8d8
Add direct filesystem access progress encoder to context
JCZuurmond Nov 1, 2024
870f3a0
Append directfsaccess to history in migration progress workflow
JCZuurmond Nov 1, 2024
88f7eb6
Fix reference to direct filesystem progress attribute
JCZuurmond Nov 1, 2024
6f25f58
Move progress encoding of direct fs to separate module
JCZuurmond Nov 1, 2024
2af23da
Add id attributes to query problem
JCZuurmond Nov 1, 2024
4be2bd8
Add QueryProblemProgressEncoder
JCZuurmond Nov 1, 2024
56377b9
Test QueryProblemProgressEncoder
JCZuurmond Nov 1, 2024
f9e948c
Add query problem progress encoder to runtime context
JCZuurmond Nov 1, 2024
f5a6122
Format
JCZuurmond Nov 1, 2024
7e0ccda
Introduce QueryProblemOwnership
JCZuurmond Nov 1, 2024
5570df8
Motivate the LegacyQueryOwnershipMixin better
JCZuurmond Nov 1, 2024
3bf45d7
Assign TODO to @JCZuurmond
JCZuurmond Nov 1, 2024
5159d0f
Move TableOwnership and TableMigrationOwnership
JCZuurmond Nov 1, 2024
b69869a
Move UsedTableOwnership out of TableOwnership
JCZuurmond Nov 4, 2024
883d08a
Move UsedTableOwnership to source_code.used_table
JCZuurmond Nov 4, 2024
c26bf87
Fix test missing named parameter
JCZuurmond Nov 4, 2024
94914cb
Overwrite dfsa on dump all
JCZuurmond Nov 4, 2024
4402e29
Overwrite tables on dump all
JCZuurmond Nov 4, 2024
9417594
Move QueryLinter backend and inventory database to constructor
JCZuurmond Nov 4, 2024
1a8cd49
Add from_dict to QueryProblem
JCZuurmond Nov 4, 2024
9fe4e9a
Rewrite dumps
JCZuurmond Nov 4, 2024
d3d9deb
Add try fetch problems
JCZuurmond Nov 4, 2024
501df79
Remove refresh from snapshots
JCZuurmond Nov 4, 2024
82d8c9e
Use direct fs progress encoder on runtime context
JCZuurmond Nov 4, 2024
b04267c
Add id attributes to UsedTable
JCZuurmond Nov 4, 2024
b520959
Add TableProgressEncoder
JCZuurmond Nov 4, 2024
1f825b4
Remove unused import
JCZuurmond Nov 4, 2024
88f635a
Add UsedTableProgressEncoder to RuntimeContext
JCZuurmond Nov 4, 2024
9a073ba
Write used tables to history
JCZuurmond Nov 4, 2024
10a766f
Format
JCZuurmond Nov 4, 2024
955f771
Fix reference to table
JCZuurmond Nov 4, 2024
48d6de6
Fix rows written for mode
JCZuurmond Nov 4, 2024
4a94ce7
Fix refresh reports mock tests
JCZuurmond Nov 4, 2024
b90640a
Remove TODO
JCZuurmond Nov 4, 2024
bf7e1c4
Store dfsas in the populated catalog
JCZuurmond Nov 4, 2024
d7b2f53
Handle incorrectness in dfsa fixture
JCZuurmond Nov 4, 2024
0712d0f
Add QueryProblem fixture
JCZuurmond Nov 4, 2024
9692c3c
Add fixture for UsedTable
JCZuurmond Nov 4, 2024
79f3df5
Add section for code migration progress to migration progress dashboard
JCZuurmond Nov 4, 2024
360b083
Add table references in code progress percentage counter
JCZuurmond Nov 4, 2024
ceb55b0
Add counter for direct file system access
JCZuurmond Nov 4, 2024
47d1d02
Add query problem counter
JCZuurmond Nov 4, 2024
87f2ebf
Add missing types in failures
JCZuurmond Nov 4, 2024
f1a46e8
Update filter in overall progress
JCZuurmond Nov 4, 2024
0b27f29
Add queries for code migration progress
JCZuurmond Nov 4, 2024
8b4b2a8
Fix white spaces
JCZuurmond Nov 4, 2024
5298769
Persist used table to allow ownership to propagate
JCZuurmond Nov 4, 2024
4a55b0e
Add counter with migrated UsedTables
JCZuurmond Nov 5, 2024
204e465
Add code migration overview
JCZuurmond Nov 5, 2024
b7cf937
Split code migration in data asset references and linting problems
JCZuurmond Nov 5, 2024
1774ce5
Add code compatability issue overview
JCZuurmond Nov 5, 2024
d7f21ec
Minimise columns selected
JCZuurmond Nov 5, 2024
9bef996
Let overview show object type
JCZuurmond Nov 5, 2024
28d301c
Add workspace resources to the dfsa fixture
JCZuurmond Nov 5, 2024
0239124
Change query id
JCZuurmond Nov 5, 2024
f3b5236
Change filter on data asset references
JCZuurmond Nov 5, 2024
523217b
Update pending migration data asset count
JCZuurmond Nov 5, 2024
ef4208f
Test migrated data asset references
JCZuurmond Nov 5, 2024
4821102
Test data asset references pending migration overview
JCZuurmond Nov 5, 2024
f765f46
Add links to code compatability issues
JCZuurmond Nov 5, 2024
6e4d89d
Fix total number of non migrated resources
JCZuurmond Nov 5, 2024
da41ca3
Update link columns in code compatibility issues
JCZuurmond Nov 5, 2024
4d296ef
Add overview for data asset references
JCZuurmond Nov 5, 2024
7139955
Fix link for different objects
JCZuurmond Nov 5, 2024
e080d7b
Use fixture to refer to used Hive metastore table
JCZuurmond Nov 5, 2024
c3e445f
Let link always point to the workspace file
JCZuurmond Nov 5, 2024
197e835
Display used hive metastore table in notebook
JCZuurmond Nov 6, 2024
5dde291
Add comment on reading while marking as is_write
JCZuurmond Nov 6, 2024
cb59ff1
Mock dfsa more accurate
JCZuurmond Nov 6, 2024
2ae66bf
Fix link to queries
JCZuurmond Nov 6, 2024
42793d8
Fix query
JCZuurmond Nov 6, 2024
99d70ea
Test query 03_05_data_asset_references_pending_migration
JCZuurmond Nov 6, 2024
026a5df
Fix mocked workspace linked
JCZuurmond Nov 6, 2024
0b26e97
Add test for and fix code compatibility issue query
JCZuurmond Nov 6, 2024
530a3ee
Format
JCZuurmond Nov 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/databricks/labs/ucx/assessment/workflows.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ def assess_dashboards(self, ctx: RuntimeContext):

Also, stores direct filesystem accesses for display in the migration dashboard.
"""
ctx.query_linter.refresh_report(ctx.sql_backend, ctx.inventory_database)
ctx.query_linter.refresh_report()

@job_task
def assess_workflows(self, ctx: RuntimeContext):
Expand Down
26 changes: 20 additions & 6 deletions src/databricks/labs/ucx/contexts/application.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@
)
from databricks.labs.ucx.hive_metastore.mapping import TableMapping
from databricks.labs.ucx.hive_metastore.table_migration_status import TableMigrationIndex
from databricks.labs.ucx.hive_metastore.ownership import TableMigrationOwnership, TableOwnership
from databricks.labs.ucx.hive_metastore.table_ownership import TableOwnership, UsedTableOwnership
from databricks.labs.ucx.hive_metastore.table_migrate import (
TableMigrationOwnership,
TableMigrationStatusRefresher,
TablesMigrator,
)
Expand All @@ -63,7 +64,7 @@
NotebookLoader,
)
from databricks.labs.ucx.source_code.path_lookup import PathLookup
from databricks.labs.ucx.source_code.queries import QueryLinter
from databricks.labs.ucx.source_code.queries import QueryLinter, QueryProblemOwnership
from databricks.labs.ucx.source_code.redash import Redash
from databricks.labs.ucx.workspace_access import generic, redash
from databricks.labs.ucx.workspace_access.groups import GroupManager
Expand Down Expand Up @@ -262,16 +263,23 @@ def tables_crawler(self) -> TablesCrawler:
return TablesCrawler(self.sql_backend, self.inventory_database, self.config.include_databases)

@cached_property
def table_ownership(self) -> TableOwnership:
return TableOwnership(
def used_table_ownership(self) -> UsedTableOwnership:
return UsedTableOwnership(
self.administrator_locator,
self.grants_crawler,
self.used_tables_crawler_for_paths,
self.used_tables_crawler_for_queries,
self.legacy_query_ownership,
self.workspace_path_ownership,
)

@cached_property
def table_ownership(self) -> TableOwnership:
return TableOwnership(
self.administrator_locator,
self.grants_crawler,
self.used_table_ownership,
)

@cached_property
def workspace_path_ownership(self) -> WorkspacePathOwnership:
return WorkspacePathOwnership(self.administrator_locator, self.workspace_client)
Expand All @@ -281,7 +289,11 @@ def legacy_query_ownership(self) -> LegacyQueryOwnership:
return LegacyQueryOwnership(self.administrator_locator, self.workspace_client)

@cached_property
def directfs_access_ownership(self) -> DirectFsAccessOwnership:
def query_problem_ownership(self) -> QueryProblemOwnership:
return QueryProblemOwnership(self.administrator_locator, self.workspace_client)

@cached_property
def direct_filesystem_access_ownership(self) -> DirectFsAccessOwnership:
return DirectFsAccessOwnership(
self.administrator_locator,
self.workspace_path_ownership,
Expand Down Expand Up @@ -511,6 +523,8 @@ def workflow_linter(self) -> WorkflowLinter:
def query_linter(self) -> QueryLinter:
return QueryLinter(
self.workspace_client,
self.sql_backend,
self.inventory_database,
TableMigrationIndex([]), # TODO: bring back self.tables_migrator.index()
self.directfs_access_crawler_for_queries,
self.used_tables_crawler_for_queries,
Expand Down
34 changes: 33 additions & 1 deletion src/databricks/labs/ucx/contexts/workflow_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,13 @@
from databricks.labs.ucx.hive_metastore.tables import FasterTableScanCrawler
from databricks.labs.ucx.hive_metastore.udfs import Udf
from databricks.labs.ucx.installer.logs import TaskRunWarningRecorder
from databricks.labs.ucx.progress.directfs_access import DirectFsAccessProgressEncoder
from databricks.labs.ucx.progress.grants import GrantProgressEncoder
from databricks.labs.ucx.progress.history import ProgressEncoder
from databricks.labs.ucx.progress.jobs import JobsProgressEncoder
from databricks.labs.ucx.progress.tables import TableProgressEncoder
from databricks.labs.ucx.progress.tables import TableProgressEncoder, UsedTableProgressEncoder
from databricks.labs.ucx.progress.workflow_runs import WorkflowRunRecorder
from databricks.labs.ucx.progress.queries import QueryProblemProgressEncoder

# As with GlobalContext, service factories unavoidably have a lot of public methods.
# pylint: disable=too-many-public-methods
Expand Down Expand Up @@ -240,3 +242,33 @@ def udfs_progress(self) -> ProgressEncoder[Udf]:
self.workspace_id,
self.config.ucx_catalog,
)

@cached_property
def query_problem_progress(self) -> QueryProblemProgressEncoder:
return QueryProblemProgressEncoder(
self.sql_backend,
self.query_problem_ownership,
self.parent_run_id,
self.workspace_id,
self.config.ucx_catalog,
)

@cached_property
def direct_filesystem_access_progress(self) -> DirectFsAccessProgressEncoder:
return DirectFsAccessProgressEncoder(
self.sql_backend,
self.direct_filesystem_access_ownership,
self.parent_run_id,
self.workspace_id,
self.config.ucx_catalog,
)

@cached_property
def used_table_progress(self) -> UsedTableProgressEncoder:
return UsedTableProgressEncoder(
self.sql_backend,
self.used_table_ownership,
self.parent_run_id,
self.workspace_id,
self.config.ucx_catalog,
)
31 changes: 25 additions & 6 deletions src/databricks/labs/ucx/framework/owners.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from databricks.sdk.service.iam import User, PermissionLevel
from databricks.sdk.service.workspace import ObjectType


logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -242,16 +243,34 @@ def _infer_from_first_can_manage(object_permissions):
return None


class LegacyQueryOwnership(Ownership[str]):
def __init__(self, administrator_locator: AdministratorLocator, workspace_client: WorkspaceClient) -> None:
super().__init__(administrator_locator)
self._workspace_client = workspace_client
class LegacyQueryOwnershipMixin:
"""Retrieve ownership of legacy queries.

def _maybe_direct_owner(self, record: str) -> str | None:
This ownership class is different from most ownership implementations as it fetches the query from the workspace,
where most other classes expect an object that contains the ownership thus not requiring workspace access.

A mixin is introduced to get query ownership for both plain query ids and class:QueryProblem, while maintaining
the type hinting and reducing risk by handling the call to the workspace in **one** place (in case another exception
needs to be caught later).
"""

@staticmethod
def _maybe_direct_owner_from_query_id(ws: WorkspaceClient, query_id: str) -> str | None:
try:
legacy_query = self._workspace_client.queries.get(record)
legacy_query = ws.queries.get(query_id)
return legacy_query.owner_user_name
except NotFound:
return None
except InternalError: # redash is very naughty and throws 500s instead of proper 404s
return None


class LegacyQueryOwnership(Ownership[str], LegacyQueryOwnershipMixin):
"""Query ownership given a query id"""

def __init__(self, administrator_locator: AdministratorLocator, workspace_client: WorkspaceClient) -> None:
super().__init__(administrator_locator)
self._workspace_client = workspace_client

def _maybe_direct_owner(self, record: str) -> str | None:
return self._maybe_direct_owner_from_query_id(self._workspace_client, record)
3 changes: 1 addition & 2 deletions src/databricks/labs/ucx/hive_metastore/mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,7 @@

from databricks.labs.ucx.account.workspaces import WorkspaceInfo
from databricks.labs.ucx.framework.utils import escape_sql_identifier
from databricks.labs.ucx.hive_metastore import TablesCrawler
from databricks.labs.ucx.hive_metastore.tables import Table
from databricks.labs.ucx.hive_metastore.tables import Table, TablesCrawler
from databricks.labs.ucx.recon.base import TableIdentifier

logger = logging.getLogger(__name__)
Expand Down
111 changes: 0 additions & 111 deletions src/databricks/labs/ucx/hive_metastore/ownership.py

This file was deleted.

34 changes: 32 additions & 2 deletions src/databricks/labs/ucx/hive_metastore/table_migrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors.platform import DatabricksError

from databricks.labs.ucx.framework.owners import Ownership
from databricks.labs.ucx.framework.utils import escape_sql_identifier
from databricks.labs.ucx.hive_metastore import TablesCrawler
from databricks.labs.ucx.hive_metastore.grants import MigrateGrants
Expand All @@ -18,8 +19,11 @@
TableMapping,
TableToMigrate,
)

from databricks.labs.ucx.hive_metastore.table_migration_status import TableMigrationStatusRefresher
from databricks.labs.ucx.hive_metastore.table_migration_status import (
TableMigrationStatusRefresher,
TableMigrationStatus,
)
from databricks.labs.ucx.hive_metastore.table_ownership import TableOwnership
from databricks.labs.ucx.hive_metastore.tables import (
MigrationCount,
Table,
Expand Down Expand Up @@ -596,3 +600,29 @@ def _sql_alter_from(self, table: Table, target_table_key: str, ws_id: int):
def _is_migrated(self, schema: str, table: str) -> bool:
index = self._migration_status_refresher.index()
return index.is_migrated(schema, table)


class TableMigrationOwnership(Ownership[TableMigrationStatus]):
"""Determine ownership of table migration records in the inventory.

This is the owner of the source table, if (and only if) the source table is present in the inventory.
"""

def __init__(self, tables_crawler: TablesCrawler, table_ownership: TableOwnership) -> None:
super().__init__(table_ownership._administrator_locator) # TODO: Fix this
self._tables_crawler = tables_crawler
self._table_ownership = table_ownership
self._indexed_tables: dict[tuple[str, str], Table] | None = None

def _tables_snapshot_index(self, reindex: bool = False) -> dict[tuple[str, str], Table]:
index = self._indexed_tables
if index is None or reindex:
snapshot = self._tables_crawler.snapshot()
index = {(table.database, table.name): table for table in snapshot}
self._indexed_tables = index
return index

def _maybe_direct_owner(self, record: TableMigrationStatus) -> str | None:
index = self._tables_snapshot_index()
source_table = index.get((record.src_schema, record.src_table), None)
return self._table_ownership.owner_of(source_table) if source_table is not None else None
Loading