Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional progress information and other enhancements for restore-dump command #910

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

orware
Copy link

@orware orware commented Sep 27, 2024

These changes add a few potentially useful capabilities to the restore-dump command.

The first is the --show-details flag, which helps provide a middle ground between the minimal default "Restoring database ..." output and the potentially too noisy --debug output, since that ends up outputting the large INSERT queries being executed as well to the terminal. This flag will output the names of the files that will be included within the restore folder and will also show progress processing the queries within the various data files.

The --start-from flag allows you to provide a table name which will then be used to skip earlier tables so that the restore will begin from the table name that is provided which can be helpful to avoid having to start a restore completely from the very first table again.
(Replaced by the --starting-table and --ending-table options described in the later comment down below.)

The --allow-different-destination flag addresses #540 and adds extra flexibility since users would not be required to adjust the database name prefix embedded into the files within a dump folder for a different database and can simply enable this option and it will allow those files to be utilized for a database with a different name.

The --schema-only and --data-only flags can help address situations where you may have a folder containing both sets of files but only need one type. While a new folder containing only the files needed is an option this mainly helps add extra flexibility without needing to take that extra step.

Example output:

The following two examples use an example database name of metrics into its main branch.

The first example shows using the --schema-only flag:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
  |- Table file: metrics.connections-schema.sql
    |- Data file: metrics.connections.00001.sql
    |- Data file: metrics.connections.00002.sql
    |- Data file: metrics.connections.00003.sql
    |- Data file: metrics.connections.00004.sql
    |- Data file: metrics.connections.00005.sql
  |- Table file: metrics.placeholder-schema.sql
Dropping Existing Table (if it exists): `metrics`.`connections`
Creating Table: `metrics`.`connections` (Table 1 of 2)
Dropping Existing Table (if it exists): `metrics`.`placeholder`
Creating Table: `metrics`.`placeholder` (Table 2 of 2)
Skipping restoring data files...
Restore is finished! (elapsed time: 1.189019208s)

The second example demonstrates how using --start-from prevents files from the earlier connections table from being included at all and allows things to start from the provided table name which happens to be called placeholder:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --start-from="placeholder"

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the connections table...
  |- Table file: metrics.placeholder-schema.sql
Starting from placeholder table...
Dropping Existing Table (if it exists): `metrics`.`placeholder`
Creating Table: `metrics`.`placeholder` (Table 1 of 1)
Skipping restoring data files...
Restore is finished! (elapsed time: 562.755959ms)

To demonstrate the usage of --allow-different-destination first let's see the default behavior when attempting to restore a dump folder, created for the metrics database, into a database named not-metrics:

pscale database restore-dump not-metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --start-from="placeholder"

Starting to restore database not-metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the connections table...
  |- Table file: metrics.placeholder-schema.sql
Starting from placeholder table...
Error: failed to restore database: VT05003: unknown database 'metrics' in vschema (errno 1105) (sqlstate HY000)

As expected, the restore is unable to complete since the filenames within the dump folder have the embedded metrics name which doesn't match with the not-metrics name being passed into the command.

But if we add the new --allow-different-destination flag in, this allows the restore to proceed into the differently named destination database.

pscale database restore-dump not-metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --start-from="placeholder" --allow-different-destination

Starting to restore database not-metrics from folder <DIRECTORY_NAME>
Restoring database ...
The allow different destination option is enabled for this restore.
Files that do not begin with the provided database name of not-metrics will still be processed without having to rename them first.
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the connections table...
  |- Table file: metrics.placeholder-schema.sql
Starting from placeholder table...
Dropping Existing Table (if it exists): `not-metrics`.`placeholder`
Creating Table: `not-metrics`.`placeholder` (Table 1 of 1)
Skipping restoring data files...
Restore is finished! (elapsed time: 543.931875ms)

I didn't want to include a lot of lines showing data files being processed but below is a snippet showing the extra details that would be included towards the end of processing a file:

  Processing Query 126 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 127 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 128 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 129 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 130 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 131 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 132 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 133 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 134 out of 134 within metrics.connections.00005.sql in thread 1
Finished Processing Data File: metrics.connections.00005.sql in 2m45.760835334s with 6m20.652604042s elapsed so far (File 5 of 5)

When working with a larger restore using the `restore-dump` command by default there is only the "Restoring database ..." indicator by default which provides minimal insight into how things are progressing.

On the other hand, if you choose to add the `--debug` flag the amount of information can then be too much as the large `INSERT` queries are then also included in the output.

The new `--show-details` flag optionally allows for users to see how things are going with their restore more easily.

Additionally, a `--start-from` flag was added that can be provided with a table name. This allows a user to skip earlier tables and start the import from a later point easily without having to create a separate copy of their dump folder and manually remove those files.

The new `--allow-different-destination` flag primarily helps with simplifying the process of taking a folder that was created for one database, e.g. the files contain a prefix for "first-database", and allows you to restore into a second database without having to adjust the database prefix on all of those existing files.

Also incorporated a check to prevent an issue for custom generated files where users might accidentally provide an `INSERT` query that is larger than 16777216 bytes. This provides some useful feedback rather than the `pkt` error that was being received previously.
From what I can tell this made no functional change and mainly allowed for the output I was providing with the new `--show-details` flag to allow showing information such as "Thread 1" rather than "Thread 0" without having to manually add one to the value elsewhere.
BuildKite was failing due to the way I was passing the helper used for printing so I made some adjustments and simplified how that worked.

Afterward I added in two more optional flags: `--schema-only` and `--data-only`.

If both are used at the same time then that's essentially the equivalent of a normal "restore" where both are being included.

The new `--start-from` parameter can also be used together with these too.
@orware orware requested a review from a team as a code owner September 27, 2024 22:06
I was thinking about this some more and it makes sense to allow for this option to be more comprehensive and allow not just >= a starting table to be included but a <= an ending table too or being able to provide a more limited starting/ending table range so I made some adjustments.

This removes the `--start-from` option and replaces it with the `--starting-table` and `--ending-table` options which can be used separately or together.
@orware
Copy link
Author

orware commented Sep 28, 2024

I was thinking about this some more and it makes sense to allow for this option to be more comprehensive and allow not just those tables >= to a starting table to be included but also the tables <= to an ending table too as well as being able to provide a more specific starting/ending inclusive table range so I made some adjustments.

The last commit removes the --start-from option and replaces it with the --starting-table and --ending-table options which can be used separately or together.

Examples:

If not using the options at all then all tables should be included as they normally would. If the ending table provided comes before the starting table alphabetically then the restore will short circuit until that is corrected.

This example skips the last two tables since the table range being provided will only include the first two:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --starting-table="abc" --ending-table="bcd"

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
  |- Table file: metrics.abc-schema.sql
  |- Table file: metrics.bcd-schema.sql
Skipping files associated with the connections table...
Skipping files associated with the placeholder table...
Restore will be starting from the abc table...
Restore will be ending at the bcd table...
Dropping Existing Table (if it exists): `metrics`.`abc`
Creating Table: `metrics`.`abc` (Table 1 of 2)
Dropping Existing Table (if it exists): `metrics`.`bcd`
Creating Table: `metrics`.`bcd` (Table 2 of 2)
Skipping restoring data files...
Restore is finished! (elapsed time: 1.005391166s)

For the case above, running the same command without the use of --starting-table would result in the same output since the abc table is the only other one before bcd:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --ending-table="bcd"

The opposite however, starting from the bcd table and continuing to the end will include the remaining tables automatically:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --starting-table="bcd"

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the abc table...
  |- Table file: metrics.bcd-schema.sql
  |- Table file: metrics.connections-schema.sql
    |- Data file: metrics.connections.00001.sql
    |- Data file: metrics.connections.00002.sql
    |- Data file: metrics.connections.00003.sql
    |- Data file: metrics.connections.00004.sql
    |- Data file: metrics.connections.00005.sql
  |- Table file: metrics.placeholder-schema.sql
Restore will be starting from the bcd table...
Dropping Existing Table (if it exists): `metrics`.`bcd`
Creating Table: `metrics`.`bcd` (Table 1 of 3)
Dropping Existing Table (if it exists): `metrics`.`connections`
Creating Table: `metrics`.`connections` (Table 2 of 3)
Dropping Existing Table (if it exists): `metrics`.`placeholder`
Creating Table: `metrics`.`placeholder` (Table 3 of 3)
Skipping restoring data files...
Restore is finished! (elapsed time: 1.321406s)

Should address these two that were noted:
[2024-09-28T00:50:36Z] internal/cmd/database/restore.go:75:10: error strings should not be capitalized (ST1005)
[2024-09-28T00:50:36Z] internal/cmd/database/restore.go:75:10: error strings should not end with punctuation or newlines (ST1005)
Rather than having the 16777216 bytes limit be completely hard coded I thought it might be better to allow for it to be adjusted in rare cases since it may be possible that it is sometimes configured differently.

Generally though, the default 16 MiB limit will be the default one encountered by our users.

Note that this option is mainly for the quick length check performed prior to running a query and doesn't / cannot adjust the system configured message size limit for queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant