-
Notifications
You must be signed in to change notification settings - Fork 11
Add new molecular database
LachlanStuart edited this page Sep 3, 2021
·
3 revisions
Adding a new molecular database is a manual process. This document describes the sequence of commands required for this.
- TSV file with next columns:
- sequential number without specifying the column name
- a unique ID for a specific database - “id” column name
- molecule name - “name” column name
- chemical formula of a molecule - “formula” column name
id name formula
13 NPA024518 Nocapyrone R C11H16O4
14 NPA024517 Penixanthone A C16H18O3
- Database name, version, description and citation. For example:
- Name: NPA
- Version: 2019-08
- Description: Taken from the NPA homepage: "The Natural Products Atlas provides open access coverage of bacterial and fungal natural products, giving researchers the power to visualize the chemical diversity of the natural world." Citation: van Santen, J. A.; Jacob, G.; Leen Singh, A.; Aniebok, V.; Balunas, M. J.; Bunsko, D.; Carnevale Neto, F.; Castaño-Espriu, L.; Chang, C.; Clark, T. N.; Cleary Little, J. L.; Delgadillo, D. A.; Dorrestein, P. C.; Duncan, K. R.; Egan, J. M.; Galey, M. M.; Haeckl, F. P. J.; Hua, A.; Hughes, A. H.; Iskakova, D.; Khadilkar, A.; Lee, J.-H.; Lee, S.; LeGrow, N.; Liu, D. Y.; Macho, J. M.; McCaughey, C. S.; Medema, M. H.; Neupane, R. P.; O’Donnell, T. J.; Paula, J. S.; Sanchez, L. M.; Shaikh, A. F.; Soldatou, S.; Terlouw, B. R.; Tran, T. A.; Valentine, M.; van der Hooft, J. J. J.; Vo, D. A.; Wang, M.; Wilson, D.; Zink, K. E.; Linington, R. G. "The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery”, ACS Central Science, 2019, 5, 11, 1824-1833. 10.1021/acscentsci.9b00806
- Molecule images Create docker image, based on Dockerfile:
cd metaspace/engine/docker/mol-struct-gen/
docker build -t metaspace2020/mol-struct-gen -f Dockerfile .
- Upload the TSV file into S3 bucket sm-mol-db and set "Read Object" for "Everyone" in the Permission tab.
- Adding information (name, version, link) about the database to vars.yml.template file for all environments in metaspace-ansibe-config repository.
- Copy TSV file into EC2 instance.
- Run import script on each environment:
source activate sm38
cd /opt/dev/metaspace/metaspace/engine/
python scripts/import_molecular_db.py NPA 2019-08 "/tmp/npa_2019-08.tsv"
- Set some fields in DB:
UPDATE public.molecular_db SET targeted=false WHERE id=ID;
UPDATE public.molecular_db SET molecule_link_template='URL' WHERE id=ID;
- Generating images of molecules using a docker container
docker run -v $PWD:/home/obabel/mol-struct-gen --rm metaspace2020/mol-struct-gen <MOLDB_FILE> <MOL_IMG_DIR>
- Archiving the directory with SVG files into tar.xz archive
tar cf - mol_img_dir/ | xz -z - > mol-images-name.tar.xz
-
Loading *.tar.xz archive on S3 bucket s3-mol-db and setting "Read Object" for "Everyone" in the Permission tab.
-
Run ansible playbook to copy images inside EC2 instance
ansible-playbook -i env/ENV provision/web.yml -t sm-web --start-at-task="Create directory for molecular structure images"
- Adding a database description for the /help page in this file.
- Run ansible playbook to apply changes
ansible-playbook -i env/dev deploy/web.yml