Create a backup restore cron on staging #812

euanmillar · 2023-12-15T12:24:37Z

No description provided.

euanmillar · 2023-12-15T12:50:55Z

.github/workflows/publish-release.yml

+ uses: mathieudutour/[email protected]
+ with:
+ github_token: ${{ secrets.GITHUB_TOKEN }}
+ tag_prefix: ${{ github.event.repository.name }}-


Essentially country config repos are tagged like this: opencrvs-farajaland-v1.3.2

…r should backup its data somewhere else of if it should periodically restore data

…sion user to other machines

…vers and backups

…se production server's encryptin key on the machine that restres the backup (staging)

when I cleared the database and then restored data there, the restore process failed if the running OpenHIM process had written new documents during this period

…ates (#789) * fix conflicts * add amends from cdpi-living-lab repository * setup pem file * fix merge conflict * add libsodium to dev dependencies * configure provisioning and deployment script so that any user with privilege escalation access can provision the host machine * compress and encrypt backup directories before sending to backup server * supply backup password to backup cronjob * supply backup encryption passphrase from github secrets * hide openhim-console by default * hide openhim-api by default * Modularise playbook tasks, use only one playbook for all deployment sizes (#798) * split playbooks to different task modules, use only one playbook for all deployment sizes * update provisioning pipeline * try initialising the provision pipeline by adding a temporary push trigger * setup ssh key before trying to provision * add known hosts file * do not try to mount cryptfs partition to /data if it's already mounted * add filebeat so logs can be accessed, monitored by kibana * fix kibana address * Setup new alerts: SSH login, error in backup logs, available disk space in data partition * add ansible task for creating user accounts for maintainers with 2FA login enabled * add new alerts for log alerts and ssh alerts * pass initial metabase sequal file to metabase as a config file so deployment doesn't have to touch the /data directory * temporarily allow root login again until we set up deployment users * add port to port forwarding container names so multiple ports can be opened from one container * Changes to environment provisioning script and log file handling * remove vagrant files * remove references to sudo password sudo operation should only be performed by humans as it gives permission to do root-level operations. automated users should have required permissions set by provisioning playbooks * remove VPN mentions for now * remove elastalert slack alert environment variable as it's not referred anywhere * remove extra environment variables from deploy script call * remove proxy config from backup script * generate BACKUP_ENCRYPTION_PASSPHRASE for all github environments * make log files be accessible by application group so SSH_USER can read and write to them * remove node version matrices from new pipelines * add separate inventory files for all environments * make docker manager1 reference dynamic * Combine country config compose files to base deployment compose files, include replica compose files in environmet-specific compose files (#808) * Production VPN (#809) * add initial wireguard server setup * move vpn to QA server * remove unused HOSTNAME parameter * fix a bug in environment creator script, make sure secrets are never committed * add development environment to provisioning scripts * add development machine to inventory * remove unnecessary PEM setup step * always use the same ansible variables * fix ansible variable reference * remove global ansible user setting * add back missing dockerhub username * disable SSH login with root login if provisioning is not done as root * convert inventory files to yml so ssh keys and users can be directly defined in them * add Tahmid's public key * fix inventory file reference * add development to machines that can be deployed to * fix known hosts mechanism in deployment pipelines * make environment seletion in deploy.sh dynamic * volume mount metabase init file as docker has a file size limit of 500kb for config files * copy the whole project directory to the server * send core compose files to the server * fix common file paths * fix environment compose file * use absolute paths in the compose file * add debug log * remove deploy log file temporarily * remove matrices from deployment pipelines * add debug log * debug github action * fix deploy pipeline syntax * add variables to debug step * make debugging an option * fix pipeline syntax * just a commit to make pipeline update on github * more syntax fixes * more syntax fixes * more syntax fixes * only define overlay net in the main deploy docker compose so that it keeps attachable * remove files from target server infrastructure directory if those files do not exist on in repo anymore * fix deploy path * do a docker login as part of deployment * only volume link minio admin's config to the container so it wont write anything new to the source code directory * remove container names as docker swarm do not support those * fix path for elasticsearch config * change the clear data script so that it doesn't touch /data directory directly. This helps us restrict deployment user's access to data * add missing env variables * do not use interactive shell * stop debug mode from starting if its not explicitly enables * add development to seed pipeline * add pipeline for clearing an environment * rename pipeline * temporarily adda a push trigger to clear environment * Revert "temporarily adda a push trigger to clear environment" This reverts commit 882c432. * fix reset script file reference, reuse clear-environment pipeline in deploy pipeline * run clearing through ssh * add missing ssh secrets * fix pipeline reference in deploy script * make clear-environment reusable * debug why no reset * add migration run to clear-environment pipeline * remove data clearing from deploy script * try without conditionals * try with a true string * use singlequotes * update staging server fingerprint * add output for reset step * fix synta * change staging IP * fix pexpect reference * remove pyexpect completely * remove python3-docker module as we do not have any ansible docker commands * try again with the module as its needed for logging in to docker * run provisioning tasks through qa * add jump host * update known hosts once more * add more logging * update qa fingerprint * lower timeout limits * restart ssh as root * change ssh restart method for ubuntu 23 * make a 1-1 mapping to github environments and deployed environments. Demo should have its own Github environment and not use production * add back docker login * make it possible to pass SSH args to deploy script * fix * make it possible to supply additional ssh parameters for clear script * updates to create environment script * configure jump host for production * update production ssh fingerprint * make production a 2-server deployment * add missing jump host definition for docker-workers * ignore VPN and other allowed addresses in fail2ban * update staging and prod docker composed * fix jinja template * configure rsync to not change file permissions * add debug * remove -a from rsync so it doesnt try to change permissions * add wireguard data partition, ensure files in deployment directory are owned by application group * make setting ownership recursive * set read parmissions to others in /opt/opencrvs so docker users can read the files * increase fail2ban limits * attach traefik to vpn network * make ssh user configurable for port-forwarding script * update wg-easy * update wg-east * fix cert resolver for vpn * use github container registry and latest version for wg-easy * pass wireguard password variable through deployment pipeline * pass all github deployment environment variables to docker swarm deployment * move environments variables to right function * make a separate function that reads and supplies the env variables * remove KNOWN_HOSTS from env variables * remove more variables, fix escape * make sure KNOWN_HOSTS wont leak to deploy step * remove debug logging * only set traefik to vpn network on QA where Wireguard server is * add validation to make sure all environment variables are set * download core compose iles before validating environment variables * fix curl urls when downloading core compose files * remove default latest value from country config version * fix country config version variable not going to docker compose files * fix compose env file order * fix environment variable filtering * add pipeline for resetting user's 2FA * fix name of the pipeline * trick github into showing the new pipeline * fetch repo first * use jump host * add debug step * remove unnecessary matrix definition * remove debugging code * use docker config instead of volume mounts where possible * add read and execute rights for others to the deployment directory as sometimes users inside docker containers do not match the host machine users * create a jump user for QA, allow definining multiple ssh keys for users * do not add 2factor for jump users * use new jump user in inventory files as well * set infobip environment variables as optional, add missing required environment variables to environment creator script * add support for 1-infinite replicas * add missing network * add missing export to VERSION variable * remove demo deployment configuration for now * Create a backup restore cron on staging (#812) * Create a backup restore cron on staging * allowed label to be passed to script for snapshot usage * Updated release action * Add approval step to production deploys * Add Riku's username to prod deploys * add separate config flag for provisioning for indicating if the server should backup its data somewhere else of if it should periodically restore data * make configuration so that qa can allow connections through the provision user to other machines * create playbook for backup servers and the connection between app servers and backups * add tags * add tag to workflow * add task to ensure ssh dir exists for backup user * create home directory for backup * ensure backup task is always applied for root's crontab * add default value for periodic_restore_from_backup * make it possible to deploy production with current infrastructure * Revert "make it possible to deploy production with current infrastructure" This reverts commit 36edf30. * fix wait hosts definition for migrations * make production a qa environment temporarily * add shell for backup user so rsync works * explicitly define which user is the one running crontab, ensure that user's key gets to backup server * ensure .ssh directory exists for crontab user * get user home directories dynamically * add missing tags * add become * fix file path * define backup machine in staging config as well * remove condition from fetch * always create public key from private key * use hadcoded file name for public key * fix syntax * make staging a QA environment so it reflects production * separate backup downloading and restoring to two different scripts, use production server's encryptin key on the machine that restres the backup (staging) * fix an issue with a running OpenHIM while we restore backup when I cleared the database and then restored data there, the restore process failed if the running OpenHIM process had written new documents during this period * restart minio after restoring data --------- Co-authored-by: Riku Rouvila <[email protected]> * fix snapshot script restore reference * remove openhim base config * remove WIREGUARD_ADMIN_PASSWORD reference from production deployment pipelines * remove authorized_keys file * add debug logging for clear all data script * define REPLICAS variable before validating it * fix syntax error in clear script * automate updating branches on release * switch back to previous traefik port definition https://github.com/opencrvs/opencrvs-farajaland/pull/789/files/7a034732d3f38cfdb00d919f470bb7e48d587cdd#r1449976486 * rename 2factor to two_factor * add default true value for two_factor * [OCRVS-6437] Forward Elastalert emails through country config (#851) * forward Elastalert emails first to country config's new /email endpoint and forward from there * add NOTIFICATION_TRANSPORT variable to deployments scripts * fix deployment * move dotenv to normal deps * add back removed environment variable * fix email route definition * make default route ignore the /email path * add missing environment variables for dev environment * [OCRVS-6350] Disable root (#849) * disable root login completely * stop users from using 'su' * only disable root login if ansible user being used is not root * add history timestamps for user terminal history (#848) * add playbook for ubuntu to update security patches automatically (#846) * fix staging + prod key access to backup server * update prod & staging jump keys * fix manager hostname reference * add a mechanism for defining additional SSH public keys that can login to the provisioning user --------- Co-authored-by: naftis <[email protected]> Co-authored-by: Riku Rouvila <[email protected]>

euanmillar added 3 commits December 15, 2023 12:23

Create a backup restore cron on staging

6b359b8

allowed label to be passed to script for snapshot usage

962d849

Updated release action

1ae3655

euanmillar commented Dec 15, 2023

View reviewed changes

euanmillar added 2 commits December 15, 2023 12:53

Add approval step to production deploys

19c98e6

Add Riku's username to prod deploys

c906757

rikukissa force-pushed the infra-improvements branch 4 times, most recently from cb4524e to 1c51bbb Compare December 18, 2023 12:51

rikukissa added 2 commits December 22, 2023 14:32

Merge branch 'infra-improvements' into restore-backup-cron

7a30966

add separate config flag for provisioning for indicating if the serve…

1866e76

…r should backup its data somewhere else of if it should periodically restore data

rikukissa had a problem deploying to staging December 22, 2023 12:53 — with GitHub Actions Failure

make configuration so that qa can allow connections through the provi…

262b15d

…sion user to other machines

rikukissa had a problem deploying to qa December 22, 2023 13:18 — with GitHub Actions Failure

rikukissa temporarily deployed to qa December 22, 2023 13:19 — with GitHub Actions Inactive

rikukissa temporarily deployed to staging December 22, 2023 13:29 — with GitHub Actions Inactive

create playbook for backup servers and the connection between app ser…

872e8b3

…vers and backups

rikukissa had a problem deploying to production December 22, 2023 14:04 — with GitHub Actions Error

rikukissa added 2 commits December 22, 2023 16:12

add tags

11ae5aa

add tag to workflow

332928e

rikukissa had a problem deploying to production December 22, 2023 14:13 — with GitHub Actions Failure

add task to ensure ssh dir exists for backup user

409f869

rikukissa temporarily deployed to production December 22, 2023 14:18 — with GitHub Actions Inactive

rikukissa temporarily deployed to staging December 22, 2023 14:22 — with GitHub Actions Inactive

rikukissa temporarily deployed to staging December 22, 2023 14:46 — with GitHub Actions Inactive

create home directory for backup

868fd31

rikukissa had a problem deploying to qa December 22, 2023 15:21 — with GitHub Actions Error

rikukissa had a problem deploying to qa December 22, 2023 15:21 — with GitHub Actions Failure

Merge branch 'infra-improvements' into restore-backup-cron

4a95e99

rikukissa had a problem deploying to staging January 5, 2024 09:47 — with GitHub Actions Failure

add become

7f0b999

rikukissa had a problem deploying to staging January 5, 2024 09:53 — with GitHub Actions Failure

fix file path

11d8431

rikukissa temporarily deployed to staging January 5, 2024 09:57 — with GitHub Actions Inactive

define backup machine in staging config as well

d2b16fe

rikukissa had a problem deploying to staging January 5, 2024 10:01 — with GitHub Actions Failure

remove condition from fetch

7a46e76

rikukissa had a problem deploying to staging January 5, 2024 10:08 — with GitHub Actions Failure

always create public key from private key

ecf0c5d

rikukissa had a problem deploying to staging January 5, 2024 10:16 — with GitHub Actions Failure

rikukissa added 2 commits January 5, 2024 12:18

use hadcoded file name for public key

cffa705

fix syntax

8b22ad9

rikukissa temporarily deployed to staging January 5, 2024 10:21 — with GitHub Actions Inactive

rikukissa had a problem deploying to production January 5, 2024 10:21 — with GitHub Actions Error

rikukissa added 2 commits January 5, 2024 13:28

make staging a QA environment so it reflects production

1339272

separate backup downloading and restoring to two different scripts, u…

04919be

…se production server's encryptin key on the machine that restres the backup (staging)

rikukissa temporarily deployed to production January 5, 2024 12:14 — with GitHub Actions Inactive

rikukissa temporarily deployed to staging January 5, 2024 12:14 — with GitHub Actions Inactive

rikukissa temporarily deployed to staging January 5, 2024 12:16 — with GitHub Actions Inactive

rikukissa temporarily deployed to production January 5, 2024 12:16 — with GitHub Actions Inactive

rikukissa added 2 commits January 5, 2024 14:45

fix an issue with a running OpenHIM while we restore backup

d2736f6

when I cleared the database and then restored data there, the restore process failed if the running OpenHIM process had written new documents during this period

restart minio after restoring data

9ca6a33

rikukissa temporarily deployed to staging January 5, 2024 12:55 — with GitHub Actions Inactive

rikukissa temporarily deployed to production January 5, 2024 12:56 — with GitHub Actions Inactive

rikukissa approved these changes Jan 5, 2024

View reviewed changes

rikukissa merged commit 7fe32bb into infra-improvements Jan 5, 2024
4 checks passed

rikukissa deleted the restore-backup-cron branch May 7, 2024 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a backup restore cron on staging #812

Create a backup restore cron on staging #812

euanmillar commented Dec 15, 2023

euanmillar Dec 15, 2023

Create a backup restore cron on staging #812

Create a backup restore cron on staging #812

Conversation

euanmillar commented Dec 15, 2023

euanmillar Dec 15, 2023

Choose a reason for hiding this comment