Skip to content

Latest commit

 

History

History
98 lines (64 loc) · 7.54 KB

backup_large_reference_architectures.md

File metadata and controls

98 lines (64 loc) · 7.54 KB
stage group info
Systems
Geo
To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments

Back up and restore large reference architectures (FREE SELF)

This document describes how to:

This document is intended for environments using:

Configure daily backups

Configure backup of PostgreSQL and object storage data

The backup command uses pg_dump, which is not appropriate for databases over 100 GB. You must choose a PostgreSQL solution which has native, robust backup capabilities.

Object storage, (not NFS) is recommended for storing GitLab data, including blobs and Container registry.

  1. Configure AWS Backup to back up both RDS and S3 data. For maximum protection, configure continuous backups as well as snapshot backups.
  2. Configure AWS Backup to copy backups to a separate region. When AWS takes a backup, the backup can only be restored in the region the backup is stored.
  3. After AWS Backup has run at least one scheduled backup, then you can create an on-demand backup as needed.

Configure backup of Git repositories

NOTE: There is a feature proposal to add the ability to back up repositories directly from Gitaly to object storage. See epic 10077.

The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups.

To back up the Git repositories:

  1. SSH into the GitLab Rails node.

  2. Configure uploading backups to remote cloud storage.

  3. Configure AWS Backup for this bucket, or use a bucket in the same account and region as your production data object storage buckets, and ensure this bucket is included in your preexisting AWS Backup.

  4. Run the backup command, skipping PostgreSQL data:

    sudo gitlab-backup create SKIP=db

    The resulting tar file will include only the Git repositories and some metadata. Blobs such as uploads, artifacts, and LFS do not need to be explicitly skipped, because the command does not back up object storage by default. The tar file will be created in the /var/opt/gitlab/backups directory and the filename will end in _gitlab_backup.tar.

    Since we configured uploading backups to remote cloud storage, the tar file will be uploaded to the remote region and deleted from disk.

  5. Note the timestamp of the backup file for the next step. For example, if the backup name is 1493107454_2018_04_25_10.6.4-ce_gitlab_backup.tar, the timestamp is 1493107454_2018_04_25_10.6.4-ce.

  6. Run the backup command again, this time specifying incremental backup of Git repositories, and the timestamp of the source backup file. Using the example timestamp from the previous step, the command is:

    sudo gitlab-backup create SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1493107454_2018_04_25_10.6.4-ce
  7. Check that the incremental backup succeeded and uploaded to object storage.

  8. Configure cron to make daily backups. Edit the crontab for the root user:

    sudo su -
    crontab -e
  9. There, add the following line to schedule the backup for everyday at 2 AM:

    0 2 * * * /opt/gitlab/bin/gitlab-backup create SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1493107454_2018_04_25_10.6.4-ce CRON=1
    

Configure backup of configuration files

We strongly recommend using rigorous automation tools such as Terraform and Ansible to administer large GitLab environments. GitLab Environment Toolkit is a good example. You may choose to build up your own deployment tool and use it as a reference.

Following this approach, your configuration files and secrets should already exist in secure, canonical locations outside of the production VMs or pods. This document does not cover backing up that data.

As an example, you can store secrets in AWS Secret Manager and pull them into your Terraform configuration files. AWS Secret Manager can be configured to replicate to multiple regions.