-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: PITR from timestamp or position not working #14765
Comments
I am unable to reproduce, but I think this issue description leaves a lot of implementation freedom. To avoid waiting days for reproduction, I used this script: #!/bin/bash
vtctldclient BackupShard commerce/0
for f in {1..3}; do
echo "new full backup cycle"
for i in {1..12}; do
echo "new incremental backup cycle"
sleep 30
mysql -e "insert into corder values(rand()*1000000,rand()*1000000,rand()*1000000,rand()*1000000);"
mysql -e "insert into corder values(rand()*1000000,rand()*1000000,rand()*1000000,rand()*1000000);"
sleep 30
vtctldclient BackupShard commerce/0 --incremental-from-pos=auto
done
vtctldclient BackupShard commerce/0
done Which:
Essentially turns The runtime totals at about My tablets are: $ mysql -e "show vitess_tablets"
+-------+----------+-------+------------+---------+------------------+------------+----------------------+
| Cell | Keyspace | Shard | TabletType | State | Alias | Hostname | PrimaryTermStartTime |
+-------+----------+-------+------------+---------+------------------+------------+----------------------+
| zone1 | commerce | 0 | PRIMARY | SERVING | zone1-0000000100 | <redacted> | 2023-12-17T12:24:32Z |
| zone1 | commerce | 0 | REPLICA | SERVING | zone1-0000000101 | <redacted> | |
| zone1 | commerce | 0 | RDONLY | SERVING | zone1-0000000102 | <redacted> | |
+-------+----------+-------+------------+---------+------------------+------------+----------------------+ I ran the following restores: vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:31:35Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:32:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:35:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:42:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:43:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:44:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:45:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:46:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:47:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:48:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:49:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:50:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:51:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:52:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:53:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:54:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T12:55:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T13:04:00Z" zone1-0000000101
vtctldclient RestoreFromBackup --restore-to-timestamp "2023-12-17T13:07:00Z" zone1-0000000101 and all went well and without error. Let's try to focus on the
Can you find anything in the |
Checking to see whether any of the above makes sense? |
Hi, will retry everything from the beginning and update with the results. |
Hi @shlomi-noach I did again test and still have an issues with restoring pitr backup. I will describe my steps and also paste some gists of components I define and logs from restoring backups $ ./pf.sh &
alias vtctldclient="vtctldclient --server=localhost:15999"
alias vtctlclient="vtctlclient --server=localhost:15999"
alias mysql="mysql -h 127.0.0.1 -P 15306 -u user"
$ vtctldclient RestoreFromBackup --restore-to-timestamp "2024-02-15T06:30:00Z" zone1-2469782763
$ vtctldclient RestoreFromBackup --restore-to-timestamp "2024-02-15T06:30:15Z" zone1-0790125915
$ vtctldclient RestoreFromBackup --restore-to-timestamp "2024-02-15T06:00:15Z" zone1-0790125915
$ vtctldclient RestoreFromBackup --restore-to-timestamp "2024-02-15T09:00:14Z" zone1-2469782763
$ vtctldclient RestoreFromBackup --restore-to-timestamp "2024-02-15T09:00:16Z" zone1-2859626137
Here is some info: $ mysql -e "show vitess_tablets"
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| Cell | Keyspace | Shard | TabletType | State | Alias | Hostname | PrimaryTermStartTime |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| zone1 | commerce | - | PRIMARY | SERVING | zone1-2548885007 | REDACTED | 2024-02-14T11:06:07Z |
| zone1 | commerce | - | REPLICA | SERVING | zone1-2469782763 | REDACTED | |
| zone1 | commerce | - | REPLICA | SERVING | zone1-0790125915 | REDACTED | |
| zone1 | customer | -80 | PRIMARY | SERVING | zone1-0120139806 | REDACTED | 2024-02-14T14:30:45Z |
| zone1 | customer | -80 | REPLICA | SERVING | zone1-2859626137 | REDACTED | |
| zone1 | customer | -80 | REPLICA | SERVING | zone1-2289928654 | REDACTED | |
| zone1 | customer | 80- | PRIMARY | SERVING | zone1-4277914223 | REDACTED | 2024-02-14T14:30:04Z |
| zone1 | customer | 80- | REPLICA | SERVING | zone1-0118374573 | REDACTED | |
| zone1 | customer | 80- | REPLICA | SERVING | zone1-2298643297 | REDACTED | |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+ Here is list of backups for $ vtctldclient GetBackups commerce/-
Handling connection for 15999
2024-02-14.144515.zone1-0790125915
2024-02-14.145018.zone1-2469782763
2024-02-14.145518.zone1-0790125915
2024-02-14.150014.zone1-0790125915
2024-02-14.153015.zone1-2469782763
2024-02-14.160015.zone1-2469782763
2024-02-14.163014.zone1-2469782763
2024-02-14.170015.zone1-2469782763
2024-02-14.173014.zone1-0790125915
2024-02-14.180015.zone1-0790125915
2024-02-14.183016.zone1-0790125915
2024-02-14.190015.zone1-2469782763
2024-02-14.193016.zone1-2469782763
2024-02-14.200014.zone1-2469782763
2024-02-14.203016.zone1-2469782763
2024-02-14.210014.zone1-2469782763
2024-02-14.213015.zone1-0790125915
2024-02-14.220014.zone1-0790125915
2024-02-14.223014.zone1-0790125915
2024-02-14.230015.zone1-2469782763
2024-02-14.233016.zone1-0790125915
2024-02-15.000014.zone1-0790125915
2024-02-15.003014.zone1-0790125915
2024-02-15.010015.zone1-0790125915
2024-02-15.013014.zone1-0790125915
2024-02-15.020015.zone1-2469782763
2024-02-15.023014.zone1-0790125915
2024-02-15.030015.zone1-2469782763
2024-02-15.033014.zone1-0790125915
2024-02-15.040015.zone1-0790125915
2024-02-15.043014.zone1-0790125915
2024-02-15.050017.zone1-2469782763
2024-02-15.053014.zone1-0790125915
2024-02-15.060015.zone1-0790125915
2024-02-15.063015.zone1-0790125915
2024-02-15.070014.zone1-0790125915
2024-02-15.073014.zone1-0790125915
2024-02-15.080015.zone1-2469782763
2024-02-15.083014.zone1-2469782763
2024-02-15.090014.zone1-2469782763 Here listed bin-logs dir from vttablet tried to restore from timestamp (binlog retention isn't defined, so I'm using default from the Vitess): $ ls vt/vtdataroot/vt_0790125915/bin-logs/
vt-0790125915-bin.000006 vt-0790125915-bin.000007 vt-0790125915-bin.000008 vt-0790125915-bin.index Here is list of /tmp which was referenced in error messsage at the end of restore attempt logs: $ ls /tmp
vt-gh-ost
vttablet.ERROR
vttablet.INFO
vttablet.WARNING
vttablet.dev-vitess-vttablet-zone1-0790125915-0d0930ad.vitess.log.ERROR.20240214-112904.1
vttablet.dev-vitess-vttablet-zone1-0790125915-0d0930ad.vitess.log.INFO.20240214-112904.1
vttablet.dev-vitess-vttablet-zone1-0790125915-0d0930ad.vitess.log.WARNING.20240214-112904.1
$ cat /tmp/vttablet.dev-vitess-vttablet-zone1-0790125915-0d0930ad.vitess.log.ERROR.20240214-112904.1
Log file created at: 2024/02/14 11:29:04
Running on machine: dev-vitess-vttablet-zone1-0790125915-0d0930ad
Binary: Built with gc go1.21.4 for linux/amd64
Previous log: <none>
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0214 11:29:04.512681 1 syslogger.go:149] can't connect to syslog INCREMENTAL backups run using this function from the script (I do get label values from vtorc pods as there is only one per Keyspace per shard, so I use them to get label values for keyspace/shard to run BackupShard) # Function to perform incremental backup
perform_incremental_backup() {
for vtorc_pod in $vtorc_pods; do
echo "Processing vtorc pod: $vtorc_pod"
# Fetch the keyspace and shard labels from the vtorc pod
keyspace=$(kubectl get pod "$vtorc_pod" -o jsonpath='{.metadata.labels.planetscale\.com/keyspace}')
shard=$(kubectl get pod "$vtorc_pod" -o jsonpath='{.metadata.labels.planetscale\.com/shard}')
# Remove 'x' letters from the shard label
shard="${shard//x/}"
echo "=========================== Executing INCREMENTAL BACKUP command on $keyspace / $shard ======================================"
# Run the BackupShard command with keyspace/shard as a single argument
vtctldclient --server="${service_name}:${port}" --alsologtostderr BackupShard --incremental-from-pos=auto "${keyspace}/${shard}"
done
} |
Hi @GenLN ! Thanks for following up. In trying to narrow things down, were you able to follow my flow on #14765 (comment), and run that exact script on your servers? Did you happen to check the Realistically, I'm not going to be able to reproduce your specific setup on a k8s cluster, so I'm trying to eliminate what I think are irrelevant aspects of your setup, and narrow down to some basics. |
Let's again focus on the specific error. From your logs above:
Thank you! |
Hi @shlomi-noach I have tried also your flow for testing pitr. Same error happened..
Where should I check mysqlbinlog path? my localhost machine which does execute vtctldclient cmds have mysqlbinlog. As I said, im using this script to port-forward vtctld, vtgate, vtadmin and use vtctldclient tool present on my machine along with other vt local@machine$ ls /usr/local/vitess/bin
mysqlctl topo2topo vtadmin vtbench vtcombo vtctlclient vtctldclient vtgate vttablet zk zkctld
mysqlctld vtaclcheck vtbackup vtclient vtctl vtctld vtexplain vtorc vttestserver zkctl
local@machine$ vtctldclient --version
vtctldclient version Version: 18.0.0 (Git revision 9a6f5262f7707ff80ce85c111d2ff686d85d29cc branch 'HEAD') built on Mon Nov 6 12:16:43 UTC 2023 by runner@fv-az422-64 using go1.21.3 linux/amd64
local@machine$ whereis mysqlbinlog
mysqlbinlog: /usr/bin/mysqlbinlog
local@machine$ mysqlbinlog --version
mysqlbinlog Ver 8.0.36 for Linux on x86_64 (MySQL Community Server - GPL)
local@machine$ echo $PATH
/usr/local/vitess/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin
These are images of the cluster images:
vtctld: vitess/vtctld:v18.0.1
vtadmin: vitess/vtadmin:v18.0.1
vtgate: vitess/lite:v18.0.1
vttablet: vitess/lite:v18.0.1
vtbackup: vitess/lite:v18.0.1
vtorc: vitess/lite:v18.0.1
mysqld:
mysql80Compatible: vitess/lite:v18.0.1
mysqldExporter: prom/mysqld-exporter:v0.14.0 vtctld doesn't have mysqlbinlog |
On the same container that runs your |
This is k8s environment, mysqld is a container inside vttablet pod..
mysqld container
vttablet container
|
We have a visibility problem here. We're looking for the Regardless, If you're able to apply this patch, build, and deploy to your k8s environment, I think we'll have a better idea of the error. |
Related issue: #15277 |
PR: #15278 |
Hi @shlomi-noach I have updates on this issue.
With this when i try to run command Have this output at the end:
Whole log output is available here gists |
@GenLN the output does not show that the
For example:
Why it's returning that I cannot say as I don't have access to your systems. Is it somehow not correct to use that specific binary log file? Is that not the right location? Does the file actually exist? I have no idea. This is not clearly a bug to me at this point and may require a lot of back and forth so might be better to move it to the Vitess Slack. |
If this requires a lot of back and forth we can mvoe to slack -- but I'd say this is a bug. The |
@mattlord Sorry for misunderstanding about this note
@shlomi-noach there isn't any folder or file neither in mysqld container or vttablet container mysqld container: $ ls -alh /tmp/
total 8.0K
drwxrwxrwt 1 root root 4.0K Mar 11 10:32 .
drwxr-xr-x 1 root root 4.0K Mar 8 10:19 ..
$ ls -alh /vt/vtdataroot/vt_0790125915/
total 36K
drwxr-sr-x 7 vitess vitess 4.0K Mar 11 10:32 .
drwxrwsr-x 3 root vitess 4.0K Mar 8 10:19 ..
drwxr-sr-x 2 vitess vitess 4.0K Mar 11 10:32 bin-logs
drwxr-s--- 8 vitess vitess 4.0K Mar 11 10:32 data
drwxr-sr-x 4 vitess vitess 4.0K Mar 11 10:32 innodb
-rw-r--r-- 1 vitess vitess 2.9K Mar 11 10:32 my.cnf
-rw-r----- 1 vitess vitess 5 Mar 11 10:32 mysql.pid
drwxr-sr-x 2 vitess vitess 4.0K Mar 11 10:32 relay-logs
-rw-r--r-- 1 vitess vitess 0 Mar 11 10:32 restore_in_progress
drwxr-sr-x 2 vitess vitess 4.0K Mar 11 10:32 tmp
$ ls -alh /vt/vtdataroot/vt_0790125915/tmp/
total 8.0K
drwxr-sr-x 2 vitess vitess 4.0K Mar 11 10:32 .
drwxr-sr-x 7 vitess vitess 4.0K Mar 11 10:32 ..
$ cat /vt/vtdataroot/vt_0790125915/restore_in_progress
vitess@dev-vitess-vttablet-zone1-0790125915-0d0930ad:/$
vttablet: $ ls -alh /tmp/
total 8.9M
drwxrwxrwt 1 root root 4.0K Mar 11 10:32 .
drwxr-xr-x 1 root root 4.0K Mar 8 10:19 ..
-rw-r--r-- 1 vitess vitess 57 Mar 11 09:58 percona-version-check
-rwxr-xr-x 1 vitess vitess 8.9M Mar 8 10:19 vt-gh-ost
Also |
@GenLN right. By the time you go looking for it, Looking at the log gist file:
So the restore process was supposed to copy file There is no error in the restore process to indicate that it was unable to create said file. So we're in the dark again. It feels like perhaps a I did not ask: are you comfortable building a |
@shlomi-noach sure just drop the files which I will pull and build the docker image to use it for vitess cluster Does it affect which user is setup for xtrabackup_user? Also is this a normal content of the dir for incremental backups? As I notice here is 0 name of the incremental backup file which I guess reference to shard name "0" but in K8S initial shards or single sharded keyspaces have "-" and not "0"
|
As far as I can see, no.
Yes. |
I think I have it. You said:
But the incremental restore does not extract the files onto |
So it means it's wrong location to download incremental backup files which should be applied by Perhaps that location is obviously okay for docker deployment (as you confirmed in docker env it works as it expected), but not for k8s? Maybe I can try to persist /tmp folder and share between containers in pods, at least I can try to see will this works? |
Why don't you try #15440, see if that works? |
The xtrabackupengine uses the https://dev.mysql.com/doc/refman/8.0/en/temporary-files.html That defaults to |
@mattlord I think you're referencing an unrelated setting. The path we're using is generated by vitess, not by MySQL. Vitess reads a backup manifest, then copies over the backup files (binary logs in our case) under said generated path, then requests Moreover, this is done by the built-in engine, not |
Ah, I see: vitess/go/vt/mysqlctl/builtinbackupengine.go Lines 1008 to 1013 in 46975b2
That's the only place that the |
So I should build vitess image using these changes and run the test? |
Yes please.
Honestly, I think, I'm not sure. This is why I asked whether you're comfortable building a |
I think this solved the issue 👍
|
Keep in mind that this only works if vitess/lite image (with included #15440) is used for mysqld. The same bug is hitting if used official mysql image(will test this with this PR as i believe vttablet container starts the Restore procedure and maybe this fix will work with official image)... Also, another question: Could we expect this to be released as a patch? I don't see any breaking changes here, just a bugfix for incremental restoration for K8s env. Also, I will do a few more tests with the image I have built from your PR and bring more updates if I encounter some issues. Still need to perform full PITR as I didn't have a chance to get to this step. 😄 |
I'm not sure I understand. Could you please explain again?
Yes. |
Alternative solution: #15451 |
Here, I just tried using official
Here I can't take incremental backups only full backup works. Check the output:
And as you can see the official bash-4.4$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
bash-4.4$ whereis mysqlbinlog
bash: whereis: command not found
bash-4.4$ mysql
mysql mysql_migrate_keyring mysql_upgrade mysqldump
mysql-secret-store-login-path mysql_ssl_rsa_setup mysqladmin mysqlpump
mysql_config mysql_tzinfo_to_sql mysqld mysqlsh
bash-4.4$ mysqlbi And the reason I did test this is because in Vitess V19.0.0 Release docs says So conclusion is to use vitess/lite image or maybe custom built image from official mysql image as base and add |
Thank you for explaining. To be more precise:
This is an altogether different issue, it is about not having |
Would you mind creating a new Issue re: the need to include a |
Yes I agree, let's not expand this issue with other stuff that isn't bound to the same core problem... |
Opened this #15452 |
Overview of the Issue
Doing incremental(builtin method)/full(xtrabackup) method. Failing PITR using timestamp or position. K8S deployment with operator.
Reproduction Steps
Run full backup manually for the first time and then run cron with every 12h full backup and incrementals every hour.
Add cron to insert random data into tables every 30min which will ensure incrementals wouldn't fail (they fail if there is no change in db from the last backup)
Try to PITR from timestamp
vtctldclient RestoreFromBackup --restore-to-timestamp "timestamp" tablet-alias
View the error
Binary Version
vitess version v18.0.1 latest tag for vitess operator
Operating System and Environment details
Log Fragments
Here is the log output when running PITR from timestamp:
https://gist.github.com/GenLN/7b9ba323aee7778390a9269379b16c2d
The text was updated successfully, but these errors were encountered: