-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: cancelling MoveTables errors with 'cannot remove tables since one or more do not exist in the denylist' #15963
Comments
I’m afraid that I will have to mark this as can’t repeat as there’s no test case. Canceling a MoveTables workflow on a large table is something that happens all the time. So to me it would seem that there’s more to it than that. You didn’t provide enough detail to try and determine what happened in your case either. Someone or something dropped the logDataHourlyDeltas table in the middle of the import. As you saw, there was an insert error showing that before you even tried to cancel the workflow, no? We could turn this into a FR / bug about allowing cancel and cleaning up when the table is already gone. That’s a known issue as you saw. Please let me know what you would like to do. Thanks! |
I'm positive nobody deleted that table. And I tried cancelling it several times, which may explain the the cancel line below that. I can reproduce it, but only about 70% of the time (the
Then:
Then:
This is the vttablet log of only the one
|
Thank you for the additional info, @wiebeytec ! I'll have to do some investigating and testing (on
Did you see this on all of that tablets though? |
I just performed the operation in a loop 20 times. It failed 20 times, and it was 100% consistent that one shard/tablet did not produce the "table doesn't exist" error upon I also looked with |
I think I see the problem, shown with this patch here:
We're removing the tables and other related artifacts before removing the workflow, so there's a race there that you would see when there's a high write rate in the stream. If you don't mind, would you mind doing the |
Great to read you found something. However, when I try it with
|
@wiebeytec was that a workflow that you also created with |
It wasn't, but you do have a point: whether I use Currently, this is the list:
But, there are actually two more tables I moved yesterday. And the The four that are there now, were put there with a single I looked in the logs for vttablet on the shards and |
Thank you again, @wiebeytec ! I see what the issue is and I've updated the PR description to reflect the changes that will be made: #15977 |
👍 Will placing the table in the deny list be bypassed with the routing rules? Otherwise table access is blocked. |
The table should only be in the deny list on the keyspace where queries for the table are NOT being routed. They are an additional safety mechanism to prevent queries against the table from being served from the wrong keyspace / side of the workflow if you e.g. use shard targeting in VTGate or query the VTTablet directly. |
Overview of the Issue
(Edit with synopsis: it seems to delete the table while the stream is still running.)
I was trying a
MoveTables
on a 2 TB table. It had been running for a few hours. Then I cancelled it, which gave this error:Running
vtctldclient MoveTables status --workflow hourlies --target-keyspace sites2023
would say it's in error. I had to remove it usingWorkflow delete
with--keep-data
(because otherwise it wouldn't remove because the source table was already gone):So to be clear, I did not remove any tables manually. There may be some race condition in checking/clearing the deny list and removing the table?
I later saw this in the log:
That suggests it dropped the table while the stream was still running.
Reproduction Steps
Try to cancel a
MoveTables
of a large table (that may have been running for a while), like:Binary Version
Operating System and Environment details
Log Fragments
The text was updated successfully, but these errors were encountered: