Skip to content

Activities Summary: Anuv

Anubhab Chakraborty edited this page Sep 16, 2021 · 8 revisions

Table of Contents

Software Installation

System Information

  • Operating System: Ubuntu 20.04
  • python3 --version: 3.9
  • pip --version: 20.0.2

pygetpapers

pygetpapers is a fetch tool written in Python, developed by Ayush Garg. It is used to fetch freely available scientific papers from select repositories.

To install pygetpapers run pip install pygetpapers

Check if pygetpapers is properly installed: pygetpapers --help

Adding pygetpapers to path

In ubuntu the binaries are installed in ~/.local/bin by default. We can add this directory to our system path, and run pygetpapers from our console. To add the binary to the system path, execute:

export PATH="$HOME/.local/bin:$PATH"

ami3

ami is a sectioning tool written in Java created by Dr. Peter Murray-Rust. It is used to section a scientific paper into different sections according to their relative position in the document and their usage.

Dependencies

  • JAVA sudo apt install default-jre

To check if the software is successfully installed, run java --version

  • Maven sudo apt install maven

After Java and Maven is installed, we git clone the repository, and build it.

git clone https://github.com/petermr/ami3.git
cd ami3
mvn install -Dmaven.test.skip=true

To add ami to system path execute the following command:

export PATH="$HOME/ami3/target/appassembler/bin:$PATH"

20210916

scilitanalysis

git clone https://github.com/ShweataNHegde/scilitanalysis.git

Create a virtual environment by following the instructions here: Working with a virtual environment
Move into the cloned directory with cd scilitanalysis/scilitanalysis
Create a requirements file with the following data:

yake
scispacy
spacy
pygetpapers
bs4
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bionlp13cg_md-0.4.0.tar.gz

Install the requirements with pip install -r requirements


20210914

Project Idea

A common representation of chemical reactions in scientific literature is in a paragraph format. Reaction information encoded in unstructured paragraph could be potentially useful in a machine-readable structured format. Chemical Markup Language (CML) is an application of XML which provides a tagset for encoding chemical information which might be useful for representing reactions found in the literature. Machines cannot simply read and understand a paragraph of plaintext the way humans do. But with NLP we might be able to identify important and chemical relevant information in paragraphs and parse the information as CML.

Why it would be useful:

There is a vast repository of chemical information locked away in paragraphs of reaction description in scientific literature. The information can be easily deciphered by a chemist, but such a process cannot scale in time and cost when analysing large amounts of scientific literature. Having such information in CML would make analysis and use of chemistry and biochemistry literature scalable.

Goal:

To identify the components of a paragraph rich in chemical reaction information and correctly encode the information in CML.

Initial Plan (Plan 0):

  • We can get a sense of the structure of a reaction by looking for certain words or word groups.
    • Look for words such as ‘reacts with’, ‘undergoes reaction’, ‘undergoes elimination’ ‘combusts’, etc. These words or phrases might indicate the presence of a chemical reaction and also tell us about the products and the type of reaction.
    • 0.5M; number followed by M indicated concentration
    • ‘Catalysed by’, ‘in presence of’ indicate catalysts and reaction conditions
    • ‘At K’ and ‘atm’, ‘temperature’, ‘pressure’, ‘NTP’, etc. indicate reaction conditions.
    • ‘Gives’, ‘to form’ is usually followed by the reaction product.
  • We can match words against a dictionary of chemical names to check if it is a valid compound or element or not.

Proposed usage:

Text:

Phenol reacts with NaOH and CO2 at 400K and 2-7atm to give Sodium Salicylate.

XML(just a representation, not actual CML):

<reaction>
   <reactant>
       <formula>C6 H6 O</formula>
   <name>Phenol</name>
   </reactant>
   <reactant>
      <formula>Na O H</formula>
      <name>Sodium Hydroxide</name>
   </reactant>
   <reactant>
      <formula>C O2</formula>
      <name>Carbon Dioxide</name>
   </reactant>
   <product>
    <formula>C7  H5 Na O3</formula>
    <name>Sodium Salicylate</name>
   </product>
   <reaction-conditions>
      <temperature>400K</temperature>
      <pressure>4-7atm</pressure>
   </reaction-conditions>
</reaction>

Related projects:

  • Identify passages containing description of a chemical reaction
  • Convert molecules descriptions into CML
  • Identify images depicting chemical molecules and reactions
  • Convert chemical molecules or reactions presented as images into CML
  • Encoding metabolic pathways as XML

20210915

Working with a virtual environment

Sometimes we may be using software that requires a specific version of a package, or we may need to run multiple programs requiring conflicting package versions. For such cases, and for software development in general, it is useful to do the development in a virtual environment. When we activate a python virtual environment, the packages available in that environment is independent of the packages installed in the system, as a result it is often necessary to install commonly used packages in the virtual environment after creating it. You can create as many virtual environments you want, you might typically want to create a seperate virtual environment for every project.

Creating a virtual environment

python3 -m venv /path/to/virtual/environment

The path would also include the name of the virtual environment. For example, if I want to create a virtual environment named 'scilit_venv' in the /home/anuv/scilitanalysis/ I would run the command:

python3 -m venv /home/anuv/scilitanalysis/scilit_venv

Activating virtual environment

source path/to/venv/bin/activate

You need to run this command every time you want to enter the virtual environment. Do note that if you are not using bash you can use the alternative activate files for specific shells, for example, if you are using fish shell then use source venv/bin/activate.fish.

Continuing the above example, we can activate the scilit_venv by running:

source /home/anuv/scilitanalysis/scilit_venv/bin/activate

Deactivating virtual environment

To leave the virtual environment simply run:

deactivate

This wiki has been continued here.

Clone this wiki locally