Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Update materialized column management #26414

Merged
merged 31 commits into from
Dec 4, 2024
Merged

Conversation

tkaemming
Copy link
Contributor

@tkaemming tkaemming commented Nov 26, 2024

Problem

  • Coordinator nodes currently are not updated using the management tools which can cause problems for high value customers if easily overlooked manual schema synchronization is not performed
  • It can be challenging with the existing code to determine what tables are being altered on what hosts and why without understanding implicit expectations baked into the logic (see feat: Support marking materialized columns as disabled and dropping #26068 (comment))
  • Slow distributed DDL would be challenging to migrate as written to something that is more resilient to toolbox pod restarts (this doesn't fix the issue, but hopefully nudges in the right direction)
  • Some distributed DDL (e.g. DROP INDEX) doesn't appear to actually be deduplicated when using ON CLUSTER and seems to need to be targeted more granularly at individual hosts to ensure efficient execution
  • Many similar changes needed to be made anyway to support nullable materialized column types anyway without it being shotgun surgery (feat: Support nullable materialized columns using native types #26448, clickhouse-cluster...column-types in particular)

Changes

  • Adds a ClickhouseCluster class that can be used to direct work to either all hosts in the default cluster (including extra hosts, i.e. coordinators - this is the only functional part of this change) or to only one host per shard, with queries executed concurrently across the selected hosts
  • Adds utilities for managing concurrent work with less boilerplate (FuturesMap)
  • Refactors materialized management commands into chunks based on host responsibility (data node, query node, etc.)

Does this work well for both Cloud and self-hosted?

N/A, ee.* only

How did you test this code?

Covered by existing (but could use more)

@tkaemming tkaemming marked this pull request as ready for review November 26, 2024 18:24
@tkaemming tkaemming requested a review from a team November 26, 2024 18:24
Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 it

@tkaemming tkaemming merged commit 789e1f6 into master Dec 4, 2024
95 checks passed
@tkaemming tkaemming deleted the clickhouse-cluster branch December 4, 2024 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants