Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
OCRVS-6000: Infrastructure deployment, monitoring and maintenance upd…
…ates (#789) * fix conflicts * add amends from cdpi-living-lab repository * setup pem file * fix merge conflict * add libsodium to dev dependencies * configure provisioning and deployment script so that any user with privilege escalation access can provision the host machine * compress and encrypt backup directories before sending to backup server * supply backup password to backup cronjob * supply backup encryption passphrase from github secrets * hide openhim-console by default * hide openhim-api by default * Modularise playbook tasks, use only one playbook for all deployment sizes (#798) * split playbooks to different task modules, use only one playbook for all deployment sizes * update provisioning pipeline * try initialising the provision pipeline by adding a temporary push trigger * setup ssh key before trying to provision * add known hosts file * do not try to mount cryptfs partition to /data if it's already mounted * add filebeat so logs can be accessed, monitored by kibana * fix kibana address * Setup new alerts: SSH login, error in backup logs, available disk space in data partition * add ansible task for creating user accounts for maintainers with 2FA login enabled * add new alerts for log alerts and ssh alerts * pass initial metabase sequal file to metabase as a config file so deployment doesn't have to touch the /data directory * temporarily allow root login again until we set up deployment users * add port to port forwarding container names so multiple ports can be opened from one container * Changes to environment provisioning script and log file handling * remove vagrant files * remove references to sudo password sudo operation should only be performed by humans as it gives permission to do root-level operations. automated users should have required permissions set by provisioning playbooks * remove VPN mentions for now * remove elastalert slack alert environment variable as it's not referred anywhere * remove extra environment variables from deploy script call * remove proxy config from backup script * generate BACKUP_ENCRYPTION_PASSPHRASE for all github environments * make log files be accessible by application group so SSH_USER can read and write to them * remove node version matrices from new pipelines * add separate inventory files for all environments * make docker manager1 reference dynamic * Combine country config compose files to base deployment compose files, include replica compose files in environmet-specific compose files (#808) * Production VPN (#809) * add initial wireguard server setup * move vpn to QA server * remove unused HOSTNAME parameter * fix a bug in environment creator script, make sure secrets are never committed * add development environment to provisioning scripts * add development machine to inventory * remove unnecessary PEM setup step * always use the same ansible variables * fix ansible variable reference * remove global ansible user setting * add back missing dockerhub username * disable SSH login with root login if provisioning is not done as root * convert inventory files to yml so ssh keys and users can be directly defined in them * add Tahmid's public key * fix inventory file reference * add development to machines that can be deployed to * fix known hosts mechanism in deployment pipelines * make environment seletion in deploy.sh dynamic * volume mount metabase init file as docker has a file size limit of 500kb for config files * copy the whole project directory to the server * send core compose files to the server * fix common file paths * fix environment compose file * use absolute paths in the compose file * add debug log * remove deploy log file temporarily * remove matrices from deployment pipelines * add debug log * debug github action * fix deploy pipeline syntax * add variables to debug step * make debugging an option * fix pipeline syntax * just a commit to make pipeline update on github * more syntax fixes * more syntax fixes * more syntax fixes * only define overlay net in the main deploy docker compose so that it keeps attachable * remove files from target server infrastructure directory if those files do not exist on in repo anymore * fix deploy path * do a docker login as part of deployment * only volume link minio admin's config to the container so it wont write anything new to the source code directory * remove container names as docker swarm do not support those * fix path for elasticsearch config * change the clear data script so that it doesn't touch /data directory directly. This helps us restrict deployment user's access to data * add missing env variables * do not use interactive shell * stop debug mode from starting if its not explicitly enables * add development to seed pipeline * add pipeline for clearing an environment * rename pipeline * temporarily adda a push trigger to clear environment * Revert "temporarily adda a push trigger to clear environment" This reverts commit 882c432. * fix reset script file reference, reuse clear-environment pipeline in deploy pipeline * run clearing through ssh * add missing ssh secrets * fix pipeline reference in deploy script * make clear-environment reusable * debug why no reset * add migration run to clear-environment pipeline * remove data clearing from deploy script * try without conditionals * try with a true string * use singlequotes * update staging server fingerprint * add output for reset step * fix synta * change staging IP * fix pexpect reference * remove pyexpect completely * remove python3-docker module as we do not have any ansible docker commands * try again with the module as its needed for logging in to docker * run provisioning tasks through qa * add jump host * update known hosts once more * add more logging * update qa fingerprint * lower timeout limits * restart ssh as root * change ssh restart method for ubuntu 23 * make a 1-1 mapping to github environments and deployed environments. Demo should have its own Github environment and not use production * add back docker login * make it possible to pass SSH args to deploy script * fix * make it possible to supply additional ssh parameters for clear script * updates to create environment script * configure jump host for production * update production ssh fingerprint * make production a 2-server deployment * add missing jump host definition for docker-workers * ignore VPN and other allowed addresses in fail2ban * update staging and prod docker composed * fix jinja template * configure rsync to not change file permissions * add debug * remove -a from rsync so it doesnt try to change permissions * add wireguard data partition, ensure files in deployment directory are owned by application group * make setting ownership recursive * set read parmissions to others in /opt/opencrvs so docker users can read the files * increase fail2ban limits * attach traefik to vpn network * make ssh user configurable for port-forwarding script * update wg-easy * update wg-east * fix cert resolver for vpn * use github container registry and latest version for wg-easy * pass wireguard password variable through deployment pipeline * pass all github deployment environment variables to docker swarm deployment * move environments variables to right function * make a separate function that reads and supplies the env variables * remove KNOWN_HOSTS from env variables * remove more variables, fix escape * make sure KNOWN_HOSTS wont leak to deploy step * remove debug logging * only set traefik to vpn network on QA where Wireguard server is * add validation to make sure all environment variables are set * download core compose iles before validating environment variables * fix curl urls when downloading core compose files * remove default latest value from country config version * fix country config version variable not going to docker compose files * fix compose env file order * fix environment variable filtering * add pipeline for resetting user's 2FA * fix name of the pipeline * trick github into showing the new pipeline * fetch repo first * use jump host * add debug step * remove unnecessary matrix definition * remove debugging code * use docker config instead of volume mounts where possible * add read and execute rights for others to the deployment directory as sometimes users inside docker containers do not match the host machine users * create a jump user for QA, allow definining multiple ssh keys for users * do not add 2factor for jump users * use new jump user in inventory files as well * set infobip environment variables as optional, add missing required environment variables to environment creator script * add support for 1-infinite replicas * add missing network * add missing export to VERSION variable * remove demo deployment configuration for now * Create a backup restore cron on staging (#812) * Create a backup restore cron on staging * allowed label to be passed to script for snapshot usage * Updated release action * Add approval step to production deploys * Add Riku's username to prod deploys * add separate config flag for provisioning for indicating if the server should backup its data somewhere else of if it should periodically restore data * make configuration so that qa can allow connections through the provision user to other machines * create playbook for backup servers and the connection between app servers and backups * add tags * add tag to workflow * add task to ensure ssh dir exists for backup user * create home directory for backup * ensure backup task is always applied for root's crontab * add default value for periodic_restore_from_backup * make it possible to deploy production with current infrastructure * Revert "make it possible to deploy production with current infrastructure" This reverts commit 36edf30. * fix wait hosts definition for migrations * make production a qa environment temporarily * add shell for backup user so rsync works * explicitly define which user is the one running crontab, ensure that user's key gets to backup server * ensure .ssh directory exists for crontab user * get user home directories dynamically * add missing tags * add become * fix file path * define backup machine in staging config as well * remove condition from fetch * always create public key from private key * use hadcoded file name for public key * fix syntax * make staging a QA environment so it reflects production * separate backup downloading and restoring to two different scripts, use production server's encryptin key on the machine that restres the backup (staging) * fix an issue with a running OpenHIM while we restore backup when I cleared the database and then restored data there, the restore process failed if the running OpenHIM process had written new documents during this period * restart minio after restoring data --------- Co-authored-by: Riku Rouvila <[email protected]> * fix snapshot script restore reference * remove openhim base config * remove WIREGUARD_ADMIN_PASSWORD reference from production deployment pipelines * remove authorized_keys file * add debug logging for clear all data script * define REPLICAS variable before validating it * fix syntax error in clear script * automate updating branches on release * switch back to previous traefik port definition https://github.com/opencrvs/opencrvs-farajaland/pull/789/files/7a034732d3f38cfdb00d919f470bb7e48d587cdd#r1449976486 * rename 2factor to two_factor * add default true value for two_factor * [OCRVS-6437] Forward Elastalert emails through country config (#851) * forward Elastalert emails first to country config's new /email endpoint and forward from there * add NOTIFICATION_TRANSPORT variable to deployments scripts * fix deployment * move dotenv to normal deps * add back removed environment variable * fix email route definition * make default route ignore the /email path * add missing environment variables for dev environment * [OCRVS-6350] Disable root (#849) * disable root login completely * stop users from using 'su' * only disable root login if ansible user being used is not root * add history timestamps for user terminal history (#848) * add playbook for ubuntu to update security patches automatically (#846) * fix staging + prod key access to backup server * update prod & staging jump keys * fix manager hostname reference * add a mechanism for defining additional SSH public keys that can login to the provisioning user --------- Co-authored-by: naftis <[email protected]> Co-authored-by: Riku Rouvila <[email protected]>
- Loading branch information