Skip to content

Config, databases, and SFTP

Amy Glen edited this page May 6, 2023 · 46 revisions

Config files

There are two ARAX config files: config_secrets.json and config_dbs.json.

config_secrets.json

This config file is auto-downloaded to machines running ARAX code (appx. every 24 hours, using the same auto-download system as the old configv2.json) from the 'master' copy on araxconfig.rtx.ai at /home/araxconfig/config_secrets.json.

This config file is meant to contain things that should never be checked into the repo or shared publicly (usernames/passwords, etc.).

In the event you want to override the 'master' config_secrets.json, simply create a local copy of config_secrets.json, rename it config_secrets_local.json, and edit its contents how you'd like. If a config_secrets_local.json is present it will always be used preferentially over config_secrets.json.

config_dbs.json

This file lives in the RTX repo (at RTX/code/config_dbs.json). It essentially contains the paths for the 'master' copies of the various databases on arax-databases.rtx.ai that are auto-downloaded to machines running ARAX code (by ARAX_database_manager.py) as well as the current Plover and KG2pre/KG2c Neo4j endpoints.

NOTE: The root of the paths in config_dbs.json (i.e., /translator/data/orangeboard/databases/) is not the current root path for them on arax-databases.rtx.ai (/home/rtxconfig/); ITRB could not adjust their scripts to work with the actual current root paths, so we left those root paths as they were in config_dbs.json, and instead RTXConfiguration maps from the old root paths to the current root paths as appropriate. It's silly, but it works. When you upload a database to arax-databases.rtx.ai, just put it under the proper KG2 directory (e.g., /home/rtxconfig/KG2.8.0).

Note: Before pushing a change to config_dbs.json in master, ensure that any new databases pointed to have already been uploaded to arax-databases.rtx.ai in the proper KG2 directory as well as to the ITRB SFTP server! (how-to below) If you point config_dbs.json (in the master branch) to a database that does not exist in both of those two places, things will break.

Overriding maturity, Plover URL, and KG2 URL

RTXConfiguration dynamically determines a machine's 'maturity' (based on current branch and/or instance/domain name), which is used to select which Plover and KG2 URLs to use. But it also provides a mechanism for overriding that maturity. If, for example, you wanted your own machine to run as 'production' maturity, simply create a local one-line file called maturity_override.txt that contains that maturity:

echo "production" > RTX/code/maturity_override.txt

Similar techniques can be used for overriding the dynamically determined Plover and KG2 API URls as well.

Here is an example for overriding the Plover URL:

echo "http://kg2cplover.rtx.ai:9990" > RTX/code/plover_url_override.txt

And an example for overriding the KG2 URL:

echo "https://arax.ncats.io/kg2/api/rtxkg2/v1.2" > RTX/code/kg2_url_override.txt

Being able to override URLs in this way can be useful if, for instance, a particular Plover instance went down, so you'd like to point a KG2 instance to a Plover that is still up, or if you're doing dev work and want to be using a particular KG2/Plover URL.

Remember to delete your local override file after you're done!

Uploading databases to ITRB's SFTP server

In addition to arax-databases.rtx.ai, all databases must be uploaded to ITRB's SFTP server, which is the instance ITRB's system downloads databases from.

ITRB manages users for the SFTP server (contact them if you need to gain access).

When uploading databases to the SFTP server, you need to upload not only the database file itself, but also its md5 checksum.

Steps for a single database

Below is a complete example showing how to upload a single database (in this case, curie_to_pmids_v1.0_KG2.7.6.sqlite) and its md5 checksum to ITRB's SFTP server:

ssh [email protected]
cd /data/orangeboard/databases/KG2.7.6
sudo bash
md5sum curie_to_pmids_v1.0_KG2.7.6.sqlite > curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit
sftp [email protected]
cd databases/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite
cd ../../md5_sums/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit

Steps for all databases at once

Generally it's easier to upload all the new databases for a new KG2 version to the SFTP server in one batch. Below is an example of doing so for the KG2.8.0 databases:

# First upload all database files to the SFTP server
ssh arax.ncats.io
cd /data/orangeboard/databases/KG2.8.0
sftp team-expander-[myuser]@sftp.transltr.io
cd databases
mkdir KG2.8.0
cd KG2.8.0
put *2.8.0*
exit

# Then create their md5 checksums and upload those as well
sudo bash
mkdir md5_sums
chmod 777 md5_sums
exit
for file in *2.8.0*; do md5sum ${file} > md5_sums/${file}.md5; done
cd md5_sums
sftp team-expander-[myuser]@sftp.transltr.io
cd md5_sums
mkdir KG2.8.0
cd KG2.8.0
put *2.8.0*
exit

You do not need to warn ITRB when deploying a new database; simply ensure that you have uploaded it and its md5 checksum to the SFTP server in the way shown above, and then push your code change to config_dbs.json that points to that new database. If your commit was to master it will trigger a rebuild of the ITRB ARAX CI instance (arax.ci.transltr.io); it would be wise to test this instance approximately 10 minutes after your commit to ensure it seems to be working properly. Note that if your commit involved pointing to a new database in config_dbs.json, you may need to wait up to around an hour to test the instance since it will take the system a while to download the new database(s) while it's rebuilding. If the system isn't working after said timeframe, post a message in the devops-teamexpanderagent channel in the Translator slack workspace and do @Pouyan Ahmadi and/or @Ke Wang.

Steps when updating a database

When you update a database, whether for a new KG2 version or for any other reason, follow these steps (order is important!):

  1. Make sure to give the new/updated database a new (unique) name (e.g., bump v1.0 --> v1.1, or KG2.X.1 --> KG2.X.2 if updating for a new KG2 version)
  2. Locally or in the branch you're working in (if applicable), update config_dbs.json to refer to the new database name
  3. Test the new database locally
    1. This includes running the ARAX pytest suite! Make sure you didn't break any tests.
  4. If all tests pass, upload the database to arax-databases.rtx.ai under the proper KG2 directory (e.g., /home/rtxconfig/KG2.8.0)
  5. Copy the database from arax-databases.rtx.ai to arax.ncats.io:
    1. scp [email protected]
    2. cd ../../data/orangeboard/databases/KG2.X.Y
    3. scp [email protected]:/home/rtxconfig/KG2.X.Y/my_database_v1.1_KG2.X.Y.sqlite .
  6. Follow the steps in the above section called Uploading databases to ITRB's SFTP server to upload the new database and its md5sum to the ITRB SFTP server
  7. Update config_dbs.json in master to point to the new database.
    1. If you're working in a branch, merge your branch into master at this point; this should carry your previous change to config_dbs.json into master.
    2. This should trigger an auto-deployment to ITRB's Staging instances, which should already have access to the new databases thanks to Step 6.
  8. At this point it's safe for master to be rolled out to arax.ncats.io.
    1. It's generally a good idea to run the DatabaseManger when doing so, but it shouldn't be required.
  9. Download the new database to cicd.rtx.ai (automatic downloads to that instance don't currently work), via these steps:
    1. ssh [email protected]
    2. cd RTX/
    3. git pull origin master
    4. python3 code/ARAX/ARAXQuery/ARAX_database_manager.py --mnt --skip-if-exists --remove_unused
    5. Note: Prior to completing this step, any commits will show as 'Failing' in GitHub. You can do this step either before or after Step 8.