-
Notifications
You must be signed in to change notification settings - Fork 21
Config, databases, and SFTP
There are two ARAX config files: config_secrets.json
and config_dbs.json
.
This config file is auto-downloaded to machines running ARAX code (appx. every 24 hours, using the same auto-download system as the old configv2.json
) from the 'master' copy on araxconfig.rtx.ai
at /home/araxconfig/config_secrets.json
.
This config file is meant to contain things that should never be checked into the repo or shared publicly (usernames/passwords, etc.).
In the event you want to override the 'master' config_secrets.json
, simply create a local copy of config_secrets.json
, rename it config_secrets_local.json
, and edit its contents how you'd like. If a config_secrets_local.json
is present it will always be used preferentially over config_secrets.json
.
This file lives in the RTX repo (at RTX/code/config_dbs.json
). It essentially contains the paths for the 'master' copies of the various databases on arax-databases.rtx.ai
that are auto-downloaded to machines running ARAX code (by ARAX_database_manager.py
) as well as the current Plover and KG2pre/KG2c Neo4j endpoints.
NOTE: The root of the paths in config_dbs.json
(i.e., /translator/data/orangeboard/databases/
) is not the current root path for them on arax-databases.rtx.ai
(/home/rtxconfig/
); ITRB could not adjust their scripts to work with the actual current root paths, so we left those root paths as they were in config_dbs.json
, and instead RTXConfiguration
maps from the old root paths to the current root paths as appropriate. It's silly, but it works. When you upload a database to arax-databases.rtx.ai
, just put it under the proper KG2 directory (e.g., /home/rtxconfig/KG2.8.0
).
Note: Before pushing a change to config_dbs.json
in master
, ensure that any new databases pointed to have already been uploaded to arax-databases.rtx.ai
in the proper KG2 directory as well as to the ITRB SFTP server! (how-to below) If you point config_dbs.json
(in the master
branch) to a database that does not exist in both of those two places, things will break.
RTXConfiguration
dynamically determines a machine's 'maturity' (based on current branch and/or instance/domain name), which is used to select which Plover and KG2 URLs to use. But it also provides a mechanism for overriding that maturity. If, for example, you wanted your own machine to run as 'production' maturity, simply create a local one-line file called maturity_override.txt
that contains that maturity:
echo "production" > RTX/code/maturity_override.txt
Similar techniques can be used for overriding the dynamically determined Plover and KG2 API URls as well.
Here is an example for overriding the Plover URL:
echo "http://kg2cplover.rtx.ai:9990" > RTX/code/plover_url_override.txt
And an example for overriding the KG2 URL:
echo "https://arax.ncats.io/kg2/api/rtxkg2/v1.2" > RTX/code/kg2_url_override.txt
Being able to override URLs in this way can be useful if, for instance, a particular Plover instance went down, so you'd like to point a KG2 instance to a Plover that is still up, or if you're doing dev work and want to be using a particular KG2/Plover URL.
Remember to delete your local override file after you're done!
In addition to arax-databases.rtx.ai
, all databases must be uploaded to ITRB's SFTP server, which is the instance ITRB's system downloads databases from.
ITRB manages users for the SFTP server (contact them if you need to gain access).
When uploading databases to the SFTP server, you need to upload not only the database file itself, but also its md5 checksum.
Below is a complete example showing how to upload a single database (in this case, curie_to_pmids_v1.0_KG2.7.6.sqlite
) and its md5 checksum to ITRB's SFTP server:
ssh [email protected]
cd /data/orangeboard/databases/KG2.7.6
sudo bash
md5sum curie_to_pmids_v1.0_KG2.7.6.sqlite > curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit
sftp [email protected]
cd databases/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite
cd ../../md5_sums/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit
Generally it's easier to upload all the new databases for a new KG2 version to the SFTP server in one batch. Below is an example of doing so for the KG2.8.0 databases:
# First upload all database files to the SFTP server
ssh arax.ncats.io
cd /data/orangeboard/databases/KG2.8.0
sftp team-expander-[myuser]@sftp.transltr.io
cd databases
mkdir KG2.8.0
cd KG2.8.0
put *2.8.0*
exit
# Then create their md5 checksums and upload those as well
sudo bash
mkdir md5_sums
chmod 777 md5_sums
exit
for file in *2.8.0*; do md5sum ${file} > md5_sums/${file}.md5; done
cd md5_sums
sftp team-expander-[myuser]@sftp.transltr.io
cd md5_sums
mkdir KG2.8.0
cd KG2.8.0
put *2.8.0*
exit
You do not need to warn ITRB when deploying a new database; simply ensure that you have uploaded it and its md5 checksum to the SFTP server in the way shown above, and then push your code change to config_dbs.json
that points to that new database. If your commit was to master
it will trigger a rebuild of the ITRB ARAX CI instance (arax.ci.transltr.io
); it would be wise to test this instance approximately 10 minutes after your commit to ensure it seems to be working properly. Note that if your commit involved pointing to a new database in config_dbs.json
, you may need to wait up to around an hour to test the instance since it will take the system a while to download the new database(s) while it's rebuilding. If the system isn't working after said timeframe, post a message in the devops-teamexpanderagent
channel in the Translator slack workspace and do @Pouyan Ahmadi
and/or @Ke Wang
.
When you update a database, whether for a new KG2 version or for any other reason, follow these steps (order is important!):
- Make sure to give the new/updated database a new (unique) name (e.g., bump v1.0 --> v1.1, or KG2.X.1 --> KG2.X.2 if updating for a new KG2 version)
- Locally or in the branch you're working in (if applicable), update
config_dbs.json
to refer to the new database name - Test the new database locally
- This includes running the ARAX pytest suite! Make sure you didn't break any tests.
- If all tests pass, upload the database to
arax-databases.rtx.ai
under the proper KG2 directory (e.g.,/home/rtxconfig/KG2.8.0
) - Copy the database from
arax-databases.rtx.ai
toarax.ncats.io
:scp [email protected]
cd ../../data/orangeboard/databases/KG2.X.Y
scp [email protected]:/home/rtxconfig/KG2.X.Y/my_database_v1.1_KG2.X.Y.sqlite .
- Follow the steps in the above section called Uploading databases to ITRB's SFTP server to upload the new database and its md5sum to the ITRB SFTP server
- Update
config_dbs.json
inmaster
to point to the new database.- If you're working in a branch, merge your branch into
master
at this point; this should carry your previous change toconfig_dbs.json
intomaster
. - This should trigger an auto-deployment to ITRB's Staging instances, which should already have access to the new databases thanks to Step 6.
- If you're working in a branch, merge your branch into
- Download the new database to
cicd.rtx.ai
(automatic downloads to that instance don't currently work), via these steps:ssh [email protected]
cd RTX/
git pull origin master
python3 code/ARAX/ARAXQuery/ARAX_database_manager.py --mnt --skip-if-exists --remove_unused
- Note: Prior to completing this step, any commits will show as failing in GitHub
- At this point it's safe for
master
to be rolled out toarax.ncats.io
.- It's generally a good idea to run the DatabaseManger when doing so, but it shouldn't be required.