-
-
Notifications
You must be signed in to change notification settings - Fork 25
- A login for the
safecast
AWS account (contact mat) - The
safecastapi
private SSH key (contact mat) - AWS CLI & ElasticBeatnstalk CLI
- Run
pip install -r requirements.txt
under a python virtualenv (direnv makes virtualenvs easy)
- Run
- Terraform
- tfenv is recommended as terraform versions will change over time
- A configured
safecast
AWS profile (aws configure --profile safecast
) -
safecast_deploy
prerequisites satisfied
To create a new environment pair (worker & web), you'll need DB connection info first. You can get this by creating a new RDS (ideally via terraform) or using the info from an existing environment. Once you've got that info, run the rake task to create the env.
AWS_EB_DATABASE_HOST=... \
AWS_EB_DATABASE_PASSWORD=... \
AWS_EB_CFG=(dev/prd/whatever) \
rake elasticbeanstalk:create
The environment name will be numeric based on existing environments of the same config name.
If you're working against an empty DB, you can run db:structure:load
to load structure.sql
to the new DB.
You will see a few errors regarding extensions and grants, but those can be ignored.
If you need to run any rails commands to debug anything you can do this:
rake ssh_dev # or prd
sudo -u webapp bash -lc 'cd /var/app/current; rails c' # or whatever command you need, or use /var/app/ondeck for any failed deployment assets
You can also run rails c
as root but be careful about writing files or the app might fail to start due to permissions problems.
eb config get prd && eb config get prd-wrk
vi .elasticbeanstalk/saved_configs/prd*.cfg.yml
eb config put prd && eb config put prd-wrk
aws elasticbeanstalk update-environment --application-name api --environment-name safecastapi-prd-002 --template-name prd
aws elasticbeanstalk update-environment --application-name api --environment-name safecastapi-prd-wrk-005 --template-name prd-wrk
To save these changes back to the configuration templates, use safecast_deploy
's save_configs
task.
As of May 2019, api.safecast.org is somewhat prone to problems due to database overloading.
This is due to it's large data set combined with a flexible query API. It's quite easy for users to generate queries which cause table scans of the measurements
table.
These take several minutes and burn lots of I/O so if too many happen at once the DB can become overloaded. Since all UI requests rely on the DB, this overload will lead to 5xx errors when trying to load the site.
If you suspect trouble the API Overview dashboard can be a good place to start.
If the RDS CPU or I/O stats are quite high, it can be good to clear in-flight queries.
This will fail some requests in flight, but most everything can be retried and it should be easier to find out what's causing trouble once the query load is more under control.
First SSH into an EC2 instance (e.g., rake ssh_prd_wrk
). Then run the psql
command to get a DB console.
Then you can use this query to see what's in flight and for how long:
select pid, age(clock_timestamp(), query_start), query
from pg_stat_activity
where state != 'idle' and query not like '%pg_stat_activity%'
order by query_start desc;
If you have many long running queries, you can terminate older queries using something like this to terminate any query that's been running longer than 5 minutes.
select pg_terminate_backend(pid)
from pg_stat_activity
where state != 'idle' and query_start < now() - interval '5 minutes';
Cron jobs are handled via elastic beanstalk's aws-sqsd which can also back up if the DB is slow for an extended period of time.
If this happens, the messages in flight will be high (more than 1 or 2). Clearing the SQS queue via the AWS console can be a good idea to ensure we're not re-computing any jobs.
The production queue should be awseb-e-aaw6am7e2x-stack-AWSEBWorkerQueue-3WZCP00RYHUX
which can be verified by the Name
tag on the queue.
The instructions below were tested on the ingest database; as of September 2019, they have not been tested on the api database.
This process is for major version upgrades, e.g. from Postgres 9.5 to 9.6 or from 10 to 11, but not from 11.4 to 11.5. Minor version upgrades, like from 11.4 to 11.5, are automatically scheduled and performed by AWS. Read about the the PostgreSQL versioning model.
Terraform has been configured for ingest in a way that should make it easy to perform major upgrades. If Amazon automatically performs a minor version upgrade, this will not break Terraform. In infrastructure/terraform/ingest/main.tf
, the engine_version
variable is set to 11
rather than 11.4
to ensure this; Terraform is aware of automatic upgrades and the Postgres versioning scheme.
However, Terraform cannot upgrade PostGIS, and AWS recommends that PostGIS be upgraded before a major upgrade. See the manual instructions below for more information on how to upgrade it.
Likewise, Terraform cannot upgrade more than one major version at a time, so if moving from, e.g. 10 to 12, you will need to perform two manual upgrades, one for each major version.
To upgrade to the next version, e.g., Postgres 12, find the following lines in the Terraform configuration and change them from 11
to 12
:
-
main.tf
-
resource "aws_db_instance" "prd-master" {
engine_version = "11"
-
resource "aws_db_parameter_group" "public-replica" {
family = "postgres11"
-
-
dev.tf
-
resource "aws_db_instance" "dev-master" {
engine_version = "11"
-
Read through the manual instructions as well to understand what Terraform should do when executing this plan.
Try to use the Terraform upgrade process first. These instructions are captured here for reference in case using Terraform is not possible.
- Read about the PostgreSQL versioning model and note the change in which digit of the version number denotes a major upgrade between Postgres 9 and Postgres 10.
- Read the AWS guide to upgrading Postgres
- Read the guide to upgrading PostGIS in order to understand the PostGIS concept of "hard" and "soft" upgrades.
- Check which extensions are installed; usually at least PostGIS will be:
SELECT extname, extversion FROM pg_extension
- Determine the versions of PostGIS and other extensions to upgrade to. The tables that list which versions of extensions are available in which Postgres version are listed in in this AWS document.
- Go to the RDS control panel, identify the read replica's configuration, and save all of it somewhere, e.g. copy/paste it to your desktop. It helps to click "Modify" and look at what is modifiable there.
- Once you are ready to start the process, and have saved the read replica's configuration, delete the read replica of the database. Read replicas cannot survive a major upgrade. Deleting it before starting the upgrade saves the performance and financial cost of replicating the
VACUUM
command's writes to the read replica. - SSH to the
safecastingest-prd
instance (or whatever instance is correct for your application) asec2-user
; the current DNS value can be found in the AWS EC2 console. - Execute a
VACUUM
; optionally, time it:time psql -c 'VACUUM;'
Doing so ahead of time will make the upgrade go faster. - Upgrade to the next major version of Postgres via the AWS RDS console, by clicking "Modify" on the appropriate database. Make sure to schedule the upgrade to happen immediately.
- Once the database upgrade is complete, perform either a hard or soft upgrade of PostGIS as necessary. This can also be done with
psql
on the application server instance. - Repeat as necessary. Since PostGIS is installed, according to the AWS documentation, the upgrade must be done by iterating through each major version to the target version; you cannot skip ahead to the end. This is likely a safer course of action regardless, since each major upgrade may change the on-disk data storage format.
- Create a new read replica with the exact same name as the old one. This will ensure it has the same DNS name as the old one, and the public DNS alias
ingest-replica1.prd.safecast.cc
will continue to work. - Compare the configuration of the old and new replicas until they match. Ensure that the replica and master are in the same availability zone to minimize any performance or financial costs.
- Update the Terraform configuration to reflect these changes. This repository is private; you may need to ask for access.
As long as the Ruby version or other dependencies don't need to be upgraded, release on the existing Elastic Beanstalk environments using safecast_deploy
. Example:
./deploy.py same_env api dev api-master-1038-e666c1461233c139d25ec5f75b72608f92cd4afb
There's two common cases when the Elastic Beanstalk environment for ingest needs to be replaced:
- When upgrading the version of Ruby in use
- When upgrading the version of Postgres server and client in use.
It's important that the major version of the Postgres client binaries on the application servers matches the version of the Postgres server; otherwise, deployment of the application will fail. Therefore, after upgrading the major version of the Postgres server (e.g. from 11 to 12), upgrade the application servers soon thereafter.
Upgrade these ingest project files:
Gemfile
.circleci/config.yml
.ruby-version
and make sure that all is still working, e.g. by opening a pull request, which will initiate a build on CircleCI.
Then, while in the project folder, download the Elastic Beanstalk environment configurations using the eb
command-line tool, e.g. eb config get dev
. Repeat for all of the environments:
dev
dev-wrk
prd
prd-wrk
Make sure that these files don't get committed to source control or widely shared -- they also contain secrets such as the database access password.
In each file, replace the PlatformArn
line with the new ARN name.
Then, upload the modified files to AWS using eb config put
, and delete the local copies. S3 will keep old versions of the configuration.
Open the project file .ebextensions/db.config
. You will see some URLs pointing to Postgres RPMs. For example:
'https://yum.postgresql.org/11/redhat/rhel-6-x86_64/postgresql11-libs-11.5-1PGDG.rhel6.x86_64.rpm' \
'https://yum.postgresql.org/11/redhat/rhel-6-x86_64/postgresql11-11.5-1PGDG.rhel6.x86_64.rpm' \
'https://yum.postgresql.org/11/redhat/rhel-6-x86_64/postgresql11-devel-11.5-1PGDG.rhel6.x86_64.rpm'
These will need to be changed -- the new version's RPMs can be found by browsing the same yum.postgresql.org
site.
You will also need to update the symbolic link that is created below this line:
/bin/ln -s /usr/pgsql-11/bin/pg_config /usr/bin/pg_config
pgsql-11
must be changed to pgsql-12
(or whatever new major version is in use).
Use safecast_deploy
. You will need to update the Grafana dashboards as well. Example:
# First, pick the new ARN you want to deploy on
./deploy.py list_arns
# for instance, api dev 123 new-arn
./deploy.py <app> <env> <version> <arn>
./deploy.py update_grafana <app>
Make sure to run as webapp
and source the cron environment first so you can access the db.
sudo -u webapp bash -lc 'cd /var/app/current; source ./cron/cron_env.sh; ./script/manual_exports/dump_measurements_ios_periods.sh'