stage | group | info |
---|---|---|
Systems |
Geo |
To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments |
This document describes how to:
- Configure daily backups
- Take a backup now (planned)
- Restore a backup (planned)
This document is intended for environments using:
- Linux package (Omnibus) and cloud-native hybrid reference architectures 3,000 users and up
- Highly-automated deployment tooling such as GitLab Environment Toolkit
- Amazon RDS for PostgreSQL data
- Amazon S3 for object storage
- Object storage to store everything possible, including blobs and Container Registry
The backup command uses pg_dump
, which is not appropriate for databases over 100 GB. You must choose a PostgreSQL solution which has native, robust backup capabilities.
Object storage, (not NFS) is recommended for storing GitLab data, including blobs and Container registry.
- Configure AWS Backup to back up both RDS and S3 data. For maximum protection, configure continuous backups as well as snapshot backups.
- Configure AWS Backup to copy backups to a separate region. When AWS takes a backup, the backup can only be restored in the region the backup is stored.
- After AWS Backup has run at least one scheduled backup, then you can create an on-demand backup as needed.
NOTE: There is a feature proposal to add the ability to back up repositories directly from Gitaly to object storage. See epic 10077.
-
Linux package (Omnibus):
We will continue to use the backup command to back up Git repositories.
If utilization is low enough, you can run it from an existing GitLab Rails node. Otherwise, spin up another node.
-
Cloud native hybrid:
The
backup-utility
command in atoolbox
pod fails when there is a large amount of data. In this case, you must run the backup command to back up Git repositories, and you must run it in a VM running the GitLab Linux package:- Spin up a VM with 8 vCPU and 7.2 GB memory. This node will be used to back up Git repositories. Note that a Praefect node cannot be used to back up Git data.
- Configure the node as another GitLab Rails node as defined in your reference architecture. Use the GitLab Environment Toolkit
gitlab_rails.yml
playbook. As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster.
The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups.
To back up the Git repositories:
-
SSH into the GitLab Rails node.
-
Configure AWS Backup for this bucket, or use a bucket in the same account and region as your production data object storage buckets, and ensure this bucket is included in your preexisting AWS Backup.
-
Run the backup command, skipping PostgreSQL data:
sudo gitlab-backup create SKIP=db
The resulting tar file will include only the Git repositories and some metadata. Blobs such as uploads, artifacts, and LFS do not need to be explicitly skipped, because the command does not back up object storage by default. The tar file will be created in the
/var/opt/gitlab/backups
directory and the filename will end in_gitlab_backup.tar
.Since we configured uploading backups to remote cloud storage, the tar file will be uploaded to the remote region and deleted from disk.
-
Note the timestamp of the backup file for the next step. For example, if the backup name is
1493107454_2018_04_25_10.6.4-ce_gitlab_backup.tar
, the timestamp is1493107454_2018_04_25_10.6.4-ce
. -
Run the backup command again, this time specifying incremental backup of Git repositories, and the timestamp of the source backup file. Using the example timestamp from the previous step, the command is:
sudo gitlab-backup create SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1493107454_2018_04_25_10.6.4-ce
-
Check that the incremental backup succeeded and uploaded to object storage.
-
Configure cron to make daily backups. Edit the crontab for the
root
user:sudo su - crontab -e
-
There, add the following line to schedule the backup for everyday at 2 AM:
0 2 * * * /opt/gitlab/bin/gitlab-backup create SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1493107454_2018_04_25_10.6.4-ce CRON=1
We strongly recommend using rigorous automation tools such as Terraform and Ansible to administer large GitLab environments. GitLab Environment Toolkit is a good example. You may choose to build up your own deployment tool and use it as a reference.
Following this approach, your configuration files and secrets should already exist in secure, canonical locations outside of the production VMs or pods. This document does not cover backing up that data.
As an example, you can store secrets in AWS Secret Manager and pull them into your Terraform configuration files. AWS Secret Manager can be configured to replicate to multiple regions.