Skip to content
ShweataNHegde edited this page May 26, 2021 · 35 revisions

Welcome to the CEVOpen wiki!

1. Main components of intern activity:

1.1. Technology

  • (getpapers, ami, wikidata/SPARQL) - search
  • dictionaries

1.2. Mini-projects

  • chemotype
  • genotype
  • activities (medicinal)
  • phenotype - invasive species integration - how these fit together - an atlas

2. Prerequisites

Python is essential to run all of our software. Ensure you've installed it before proceeding further.

2.1.Install

2.1.1. pygetpapers (https://github.com/petermr/pygetpapers)

Run the following command on your command line to install pygetpapers

pip install git+git://github.com/petermr/pygetpapers

If you have trouble installing using this method, you can find alternatives here.

2.1.2. ami_gui.py

  • git clone https://github.com/petermr/openDiagram.git
  • Though ami_gui.py runs on the command line, you will have to make some changes to the source code to point the software to where all the projects outlined below lie on your local machine. PyCharm is recommended to edit the source code.

2.2. git clone

The project has gradually expanded and branched out to different research areas. Therefore, our work is dispersed across various different repositories. To run amigui_py, you will have to clone the following repositories:

3. Overall Goal

To build a multilingual semantic Atlas of Volatile Phytochemistry.[1]

3.1. Subgoals

To build Open Source multiplatform tools which can discover, aggregate, clean, and semantify scholarly documents containing significant amounts of phytochemical VOC[2]s. Documents will contain, extraction and assay of oils, optionally with properties and activities.

3.2. Tools include:

  • APIs for repositories such as EPMC, biorXiv preprints, and thesis collections.
  • Scrapers for semi-structured sites such as journals
  • standardised metadata (e.g. JATS)
  • PDF and HTML readers => XML or JSON
  • article sectioning (e.g. into JATS categories)
  • extraction of floats (tables, maps, images, diagrams, chemistry, maths*)
  • display and navigation of sections in a paper
  • aggregated statistics and machine learning
  • multilingual annotation (using Wikidata)
  • linking to the Wikidata knowledge graph

3.3. Required actions:

  • Coordination of EO-related and general dictionaries - conformance to a common standard.
  • Validation of gold-standard minicorpora (e.g. for training and validating machine learning)

[*] not included in CEVOpen but extensible in future
[1] we need an engaging title. "Atlas" is often extended beyond maps (e.g. Atlas of The Human Body). For example, plantPart is an atlas of the plant. It works for me but may confuse others. Here are some ideas:

  • "Compendium of ..."
  • "Semantic Essence of phytochemistry". Essence == central meaning, and also volatiles
    But please think creatively.

[2] Volatile Organic Compound


4. Outreach

We've presented our work (mostly of openVirus) at various places including Wikcite, COAR and BarCamp. You can take a look at our Outreach page. If you're a newbie, taking a look at our presentations is, probably, the best way to get started to understand the pipeline.


5. Code of Conduct

All the interns, volunteers and contributors should adhere to the code of conduct, outlined here.

Clone this wiki locally