Skip to content

Scoping through TPS Corpus

Sagar Jadhav edited this page Nov 19, 2021 · 33 revisions

Scoping through TPS Corpus:

  1. Date 2/8/2021

    I queried for following searches and got results as:

    Query Number of hits
    terpene synthase 4308
    terpene synthase plant 3447
    terpene synthase plant volatile 1200
    terpene synthase plant TPS 650
    terpene synthase TPS plant volatile 376
    terpene synthase TPS plant volatile compounds 355 (Research articles 312) only 188 mention both TPS & compounds
  2. I continued TPS corpus on Date 3/8/2021, 4/8/2021 and 5/8/2021

    For 312 papers, I looked PMCID, Plant, Compound and TPS nomenclature availability.

  3. Date 5/8/2021

    Pls find TPS corpus 312 papers

  4. **Date 6/8/2021 I continued improving scooping through TPS corpus.

  5. **Date 9/8/2021

    Pls find improved TPS corpus 91 papers

  6. **Date 10/8/2021 and 11/8/2021

    continued improving scooping through TPS corpus and INYAS presentation slides.

    Out of 312 papers, only 188 papers mention both TPS and volatile compounds.

  7. Date 16/8/2021


Camellia sinensis






Vitis vinifera

  1. INYAS Interns:

    TPS genes for different species.








    Develop corpus "terpene synthase oryza"

    Extract terms from papers.

    Create dictionary and test.

    Prenyltransferases from medicinal plants

    Classify those TPS for each subspecies

    Check if AtTPS1 is related to OsTPS1 or something similar in oryza corpus.

  2. **Date 18/8/2021

    Pls find TPS volatile corpus 121 papers

  3. **Date 19/8/2021 Created a template for the 5 KARYA projects

  4. **Date 23/8/2021 I created testtps dictionary


    full data table testtps

  5. **Date 24/8/2021, 25/8/2021 and 26/8/2021 Extracting volatile compounds from 121 papers (point 9).

    volatiles from 121 corpus

  6. **Date 24/8/2021, 25/8/2021 and 26/8/2021 Extracting volatile compounds from 121 papers (point 9)

  7. Date 27/8/2021, 30/8/2021

  8. Date 31/8/2021, 1/9/2021 Helping KARYA interns with installation of pygetpapers and ami3. 2/9/2021 meeting. 3/9/2021 Helping NIPGR intern with same.

  9. Date 6/9/2021 installing softwares for set up a virtual environment.

    python -m venv project_env creating env

    project_env\Scripts\activate.bat env activation

    (Warning:This Python interpreter is in a conda environment, but the environment has not been activated. Libraries may fail to load. To activate this environment run above command)

    Installed anaconda and then run C:\Users\user\anaconda3\Scripts\activate base

    pip install scispacy

    copy requirements.txt into sagar jadhav

    Use conda to install and manage different versions of Python

    conda create --name project_env python=3.6.0

    conda activate project_env

  10. Installed python 3.6, pycharm. Metadata analysis script runs but ami3 is not installed on my mac.

  11. Installed ami3 on my mac. Set path.

  12. Finding species that are highly represented in literature pygetpapers -q "terpene synthase TPS plant" -o TPS -p -k 650

In order to run METADATA ANALYSIS script by Shweata, I followed following protocol

Create folder, Open folder into pycharm and run commands or click on **add interpreter**, then click on conda environment, select python 
3.6, select conda path. Run the commands 

`conda create --name project_env python=3.6.0`

`conda activate project_env` 
  1. Ran metadata analysis script, 1st ran downloaded papers (pygetpapers -q "terpene synthase TPS plant" -o TPS -p -k 620) and shown lxml not installed error. so pip install lxml. Then commented to avoid paper download again. Instead of Citrus, I added TPS. I also uncommented lines 164, 165 and 166.

  2. Please, find TPS metadata analysis output TPS metadata analysis

  3. Extract "TPS conatining sentences": I used . (dot) in line 144 Shweata script. [words = text.split(".")] and also removed line 175.

  4. Please, find TPS Senetences extraction TPS Sentences Extraction

  5. Created TPS pathway dictionaries TPS pathway TPSpathway

  6. Created dictionary for abbreviations of binomial nomenclature abbreviation binomial

  7. CROP TPS diction

  1. git cloned pyami. set path by running command open -e .bash_profile. then copying the following. export P2_HOME=/Users/sagar/pyami export PATH=$PATH:$P2_HOME/py4ami

  2. Install pycharm. created folder valdict in pyami. add interpreter conda env, python 3.8, select conda path. Run the commands

    conda create --name project_env python=3.8.0

    conda activate project_env save. close.

    Reopen folder. run pip install pytest. run then gave lxml error. pip install lxml. run again . then gave py4ami module not found error. then run pip install py4ami. Then gave error ImportError: cannot import name 'AMIDict' from 'py4ami.dict_lib'.

TPS enzyme dictionary

wiki binomial abbreviation

18/11/2021 Documentation of crops repository.

19/11/2021 Uploading corpora to crops repository

Move the file (folder) you'd like to upload to GitHub into the local directory that was created when you cloned the repository. Open Terminal. Change the current working directory to your local repository. cd crops Stage the file for committ to your local repository. git add . Commit the file that you've staged in your local repository. git commit -m "Add existing file" Push the changes in your local repository to git push give ur username. generate token by going to settings then developer settings. copy token and paste into the password.

Clone this wiki locally