-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update table-migration workflows to also capture updated migration progress into the history log #3239
base: main
Are you sure you want to change the base?
Conversation
…-status index: this is supposed to be handled explicitly.
…ry log when refreshing the migration status at the end.
❌ 49/54 passed, 2 flaky, 5 failed, 4 skipped, 1h57m59s total ❌ test_migration_job_ext_hms[regular]: AssertionError: assert False (5m30.935s)
❌ test_table_migration_job_refreshes_migration_status[hiveserde-migrate-external-tables-ctas]: AssertionError: Workflow failed: assessment (11m1.624s)
❌ test_hiveserde_table_in_place_migration_job[hiveserde]: AssertionError: Workflow failed: assessment (10m47.167s)
❌ test_table_migration_job_publishes_remaining_tables[regular]: AssertionError: Workflow failed: assessment (10m41.403s)
❌ test_hiveserde_table_ctas_migration_job[hiveserde]: AssertionError: Workflow failed: assessment (10m46.183s)
Flaky tests:
Running from acceptance #7424 |
migrate_views, | ||
setup_tacl, | ||
], | ||
job_cluster="tacl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not main
? we're using py4j there and it doesn't always play well with tacl clusters, where only python is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try this.
updated_migration_progress = ctx.migration_status_refresher.snapshot(force_refresh=True) | ||
ctx.tables_migrator.check_remaining_tables(updated_migration_progress) | ||
|
||
@job_task(depends_on=[update_migration_status], job_cluster="table_migration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a task to check prerequisites, otherwise this job will fail:
@job_task(job_cluster="table_migration")
def verify_prerequisites(self, ctx: RuntimeContext) -> None:
"""Verify the prerequisites for running this job on the table migration cluster are fulfilled.
We will wait up to 1 hour for the assessment run to finish if it is running or pending.
"""
ctx.verify_progress_tracking.verify(timeout=dt.timedelta(hours=1))
updated_migration_progress = ctx.migration_status_refresher.snapshot(force_refresh=True) | ||
ctx.tables_migrator.check_remaining_tables(updated_migration_progress) | ||
|
||
@job_task(depends_on=[update_migration_status], job_cluster="table_migration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add prereqs
ctx.tables_migrator.check_remaining_tables(updated_migration_progress) | ||
|
||
@job_task(depends_on=[update_migration_status], job_cluster="table_migration") | ||
def update_tables_history_log(self, ctx: RuntimeContext) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add prereqs
upgraded_to="ucx_default.db1_dst.dst_table1", | ||
tables = ( | ||
( | ||
Table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change is necessary? looks like a whitespace, but there might be other things hidden here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The was part of factoring out the sequence of tables returned by the snapshot()
mock (and make it immutable) so that the migration-index mock return the correct values. I'm not sure why the extra parentheses were in there, but I've removed them. Other than those it was only whitespace.
It should return all the tables as a snapshot.
…ng the migration index is available during encoding.
… is passed into the table encoder. This has been backed out to reduce clutter on the PR; they have been moved to PR #3270.
…orkflow if the assessment has not completed.
Integration tests currently failing due to a dependency issue on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Changes
The table-migration workflows already contained tasks at the end that log information about tables that still need to be migrated. The primary purpose of this PR is to update these workflows so they also capture updated progress information into the history log.
Other changes include:
Updating the (singleton) encoder for table-history so that initialisation doesn't trigger an implicit refresh of theMoved to Refactor refreshing of migration-status information for tables, eliminate another redundant refresh. #3270.TableMigrationStatus
data. Instead this is controlled at the workflow level, as intended.Linked issues
Conflicts with #3200 (will need rebasing).(Resolved.)Functionality
updated documentation
modified existing workflows:
migrate-tables
migrate-external-hiveserde-tables-in-place-experimental
migrate-external-tables-ctas
scan-tables-in-mounts-experimental
migrate-tables-in-mounts-experimental
Tests