The scripts in this repo fetches old system data from postgres, and POST into DB using new system's server & dynamoDB.
- Make sure you're at root directory of this repo.
- Create a virtual environment
python -m venv venv
. - Acticate venv
. ./venv/bin/activate
. - Install standard dependencies. Run
pip install -r requirements.txt
- Install dependencies of serverless code base. This will allow the script to reuse utilities from our gci/vci serverless code base.
- Make sure you're at root directory of this repo.
- Configure
GCI_VCI_SERVERLESS_RELATIVE_PATH=...
in.env
- point it to thegci-vci-serverless
(its repo root) directory relative to this repo. - Run
. ./install-gci-vci-serverless-code.sh
.
- AWS credentials are required for accessing postgres on AWS RDS. Specifically,
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - Make sure the postgres instance (migration) on AWS RDS is on and ready (not stopped).
- Create the config yaml file
config_recent.yaml
at root directory:
db:
ec2:
user: <acquire-this-from-the-team>
password: <acquire-this-from-the-team>
host: <acquire-this-from-the-team>
port: <acquire-this-from-the-team>
database: <acquire-this-from-the-team>
local:
user: your-db-user
host: 127.0.0.1
port: 5432
database: your-db-name
endpoint:
url: http://0.0.0.0:3000/ # the local serverless endpoint
# url: https://xxxx.execute-api.us-west-2.amazonaws.com/xxx # or paste the AWS RDS postgres endpoint here
queries:
- select item_type, rownum, item from (
select row_number() over(order by rid, sid) as rownum,
item_type, item
from migrate_recent_items where item_type=%s
)a
- If you are considering setup a local postgres with production data, you can refer to the section "How to prepare a postgres with legacy production data" below.
- Make sure if you're using postgres on AWS RDS, change the following in
migrate_single_gdm.py
def getConnection():
type = 'ec2' # or change to `local` if using a local postgres
- (Recommended) Create a directory
data
for db data atgci-vci-aws/gci-vci-serverless/.dynamodb/data/
. cd
togci-vci-serverless
directory, and start DynamoDB local server:npx nodemon --watch .dynamodb/data --ext db --exec 'npx serverless dynamodb start --migrate --dbPath ./.dynamodb/data'
. Thenodemon
will auto restart the DynamoDB local server whenever db data changed.- It's better to have a db with your user object already created, so at least you can login the UI.
- Run
npx nodemon --ext py --exec "npx serverless offline --noTimeout"
- Make sure you're at root directory of this repo.
- Run
eval $(egrep -v '^#' .env | xargs) AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... python migrate_single_gdm.py
.
./.data/gdm__related_objects_{gdmPK}
:
Generated by migrate_single_gdm.py
. SQL fetching from postgres is slow. The script will cache objects retrieved from postgres into file in directory .data
. If in any case you believe the data in cache file is corrupted, you can delete the cache file and re-fetch again from postgres.
./.data/all_variants.json
:
This file is generated by migrate_all_variants.py
, saving all variants fetched from postgres.
./.data/not_in_db_variants.json
This file will be generated when you run migrate_all_variants.py
.
It memories the diff between all_variants.json
, and the variants in serverless DynamoDB. So if POSTing to dynamodb got interrupted, the next time you run migrate_all_variants.py
, it doesn't need to POST from start again.
- Q: I want to reset my database
- A: You can delete the db data file at
gci-vci-serverless/.dynamodb/data/shared-local-instance.db
, and restart the dynamodb dev server. A new db data file will be created with empty Tables. - A: (advance) If you want to do this programmatically, you can take a look at
utils/sls.py
theclass DynamoDB.reset()
method, and call it inif __name__ == "__main__":
inmigrate_single_gdm.py
.
- A: You can delete the db data file at
- Q: I want to change a GDM rid to collect objects.
- A: Change the hard-coded
GDM_RID
value in filemigrate_single_gdm.py
.
- A: Change the hard-coded
- Q: I want to clear out all items in DyanmoDB table
- A: If you're on local, simply delete the file
shared-local-instance.db
undergci-vci-serverless/.dynamodb
- A: Or if you're on AWS cloud, the best way is to drop and create the table. Install & configure AWS CLI tool first. Then you can refer to the file
scripts/reset_dynamodb_table.sh
, changeTABLE_NAME
andAWS_PROFILE
to adapt to your config & credentials, then run the script.
- A: If you're on local, simply delete the file
- Either on local or on AWS RDS, create a new postgres database instance.
- Also create a database in the instance with name, e.g.
prod_12032020
. - If you are running AWS RDS database, make sure it is publicly accessible (go to db instance, press modify, and change the public accessibility setting), and the security group rules allows inbound rules of port
5432
and from any IP (or you can set to your IP your laptop network is currently using, ideally if your IP address provided by ISP is static, oftentime not though).
- Go into s3 and download the sql file from s3 bucket
gcivciforhelio
, e.g. the filepg-dump-2020-12-03.sql.gz
. Download and de-compress it. - Run
psql -U postgres -h <host> -d prod_12032020 -f <path to de-compressed sql file>
. The host islocalhost
if you're running local postgres, or in form ofdb-instance-name.xxx.us-west-2.rds.amazonaws.com
if you're using AWS RDS.
- Connect to the postgres database
psql -U postgres -h <host> -d prod_12032020
- Thanks to Rao's work, run the following to transform the item_types. Make sure the right hand side of lines
WHEN r.item_type::text = ...
, the item_type should match that in new system. This will create a virtual tablemigrate_recent_items
with the transformed item_types, and we can run sql query against this table. The virtual table data is persistent.
-- public.migrate_recent_items source
CREATE OR REPLACE VIEW public.migrate_recent_items
AS WITH all_recent_items AS (
SELECT p_1.rid,
p_1.sid,
CASE
WHEN r.item_type::text = 'extra_evidence'::text THEN 'curated-evidence'::character varying
WHEN r.item_type::text = 'provisional_variant'::text THEN 'provisional-variant'::character varying
WHEN r.item_type::text = 'provisionalClassification'::text THEN 'provisional-classification'::character varying
-- the following `caseControl` and `evidenceScore` was transformed as
-- all-lowercase in Rao's sql,
-- which will affect the final generated table's column `item_type`.
-- here we are keeping their `item_type` as-is in order to align with new system format
-- but you can use Rao's sql script as long as you make sure the sql query
-- run by python script aligns with the transformed item_type
--
-- WHEN r.item_type::text = 'caseControl'::text THEN 'caseControl'::character varying
-- WHEN r.item_type::text = 'evidenceScore'::text THEN 'evidenceScore'::character varying
ELSE r.item_type
END AS item_type,
p_1.properties
FROM propsheets p_1,
resources r,
current_propsheets cp
WHERE cp.rid = p_1.rid AND cp.sid = p_1.sid AND r.rid = p_1.rid
)
SELECT p.rid,
p.sid,
p.item_type,
(('{"body":'::text || (((((('{"rid":"'::text || p.rid) || '"}'::text)::jsonb) || ((('{"item_type":"'::text || p.item_type::text) || '"}'::text)::jsonb)) || p.properties)::text)) || '}'::text)::jsonb AS item
FROM all_recent_items p;
You can now run the script above. Make sure your database connection config is set properly.