Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-sqlserver: Update discover logic to be dependent on whether automatic table cleanup is enabled #1950

Open
willdonnelly opened this issue Sep 16, 2024 · 0 comments

Comments

@willdonnelly
Copy link
Member

willdonnelly commented Sep 16, 2024

Creating a new CDC instance in SQL Server requires DBO permissions. In practice this means that there are two configurations that actually work:

  1. The user gives us the sa login or an equivalently permissioned user, and we handle everything for them. This is most common in proof-of-concept tests.
  2. The user provisions a flow_capture user with minimal permissions and handles enabling CDC on the DB and on individual tables themselves. This is more common in production.

Currently what happens is we discover all tables in the database, and if CDC has not already been enabled on a table at validation time we go and try to enable it. In configuration (2) the user sees the initial validation error and goes and enables CDC on the tables in response (or very rarely they read our docs and set it up ahead of time), and then we're happy.

The issue is that if they leave the "automatically add new tables" toggle set, we will discover a new table at some point in the future and be unable to publish any updates (including our own automatic ones for schema evolution and stuff) adding a binding for that table because CDC hasn't been enabled on it.

Part of the solution here is to tie discovery behavior to the "Automatic Capture Instance Management" advanced option. When that's disabled, then the user is responsible for enabling CDC on tables and we should only discover tables once they've done that for us. When it's enabled, we're responsible for enabling CDC on tables and should discover all tables regardless of CDC enablement.

Of course, the problem with that is that the option currently defaults to off, which means the garden path experience for a user who didn't know ahead of time to go set everything up is that they'll run discovery and we'll return an empty result set. This is also not ideal.

To fix this we either need some way of knowing that a discover run is part of "initial capture creation" versus a subsequent refresh, or else we need to do a whole migration and start defaulting the "Automatic Capture Instance Management" toggle to on so that users have to go explicitly disable it when they're not giving us a DBO account. The basic flow then would go:

  1. User runs discovery and we find all tables.
  2. User tries to publish, and either we create CDC instances (in which case we're done) or it fails and:
  3. We present the user with an error message telling them we either need DBO permissions or they need to disable that option in advanced settings.
  4. The user disables the option and we give them another error message (or maybe this is part of the same message) telling them to go enable CDC on the requested tables.
  5. The user enables CDC on the tables, we check and validate that CDC is indeed enabled, and validation passes and we let them publish the new capture.
@willdonnelly willdonnelly changed the title Update discover logic for sql server to be dependent on whether automatic table cleanup is enabled source-sqlserver: Update discover logic to be dependent on whether automatic table cleanup is enabled Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant