Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Friday, October 12 Notes #3

Open
klgray25 opened this issue Oct 12, 2018 · 1 comment
Open

Friday, October 12 Notes #3

klgray25 opened this issue Oct 12, 2018 · 1 comment

Comments

@klgray25
Copy link
Contributor

klgray25 commented Oct 12, 2018

Friday, October 12

Project Background

Prior Sample/Data Processing

  • Samples collected from marine environments around the world
  • Water sample collected
  • Filtered to isolate viral component and bacteria (+viruses) component
  • Selected for dsDNA viruses only
  • Fractions underwent shotgun metagenomic sequencing
  • Assembled into contigs
  • Generated RPKM values
  • Compared to metabolomics databases (MetaCyc, COG, KEGG, 3 more) using FAST - combined the rpkm values for all enzymes in pathway with set of rules (specific to each pathway)
  • Output was RPKM value for enriched pathways for every sample
  • EBI Link: https://www.ebi.ac.uk/metagenomics/studies/ERP001736
  • Metadata Link: https://www.ebi.ac.uk/ena/submit/tara-oceans-checklist

Data Contents

  • Rpkm - After the contigs are assembled, the reads were aligned back to the contigs - this was used to generate rpkm (essentially a normalized genomic abundance metric - reads per kilobase of transcript per million mapped reads)
  • Sample name - c=non-viral (includes bacterial and viral), ERR=viral only
  • Type - single includes only one fraction - multi includes the viral and bacterial fraction data in a single analysis
    Date information exists - Simon Rao
  • See comment below for more details
  • It's okay to make the data public

Notes From Steve Hallam Visit

  • Tara Oceans Project - project that underwent the sampling expedition: http://ocean-microbiome.embl.de/companion.html
  • International consortium of oceanography/marine biologists - made a standardized sample collection process (data is comparable)
  • First expedition - photic samples - didn’t sample very deep
  • PathwayTools - prediction engine - need licence - made MetaCyc identifiers from this
  • PathoLogic - has harmonized names
  • New idea - metabolically functional genes encoded in viruses - more widespread than imagined before
  • Talked about this paper: http://www.pnas.org/content/108/39/E757.short
  • Cyanobacteria normally have fast turn-over - slow down and halt photosynthesis in response to viruses (sequester them and protect neighbouring cells) - virus carries genes that are part of the photosystem - overcomes the defence mechanism and promotes photosynthesis, cellular division
  • Pathway tools - KEGG Atlas - have diagrams for metabolism - recommended using these
  • Envisions this turning into a manuscript - Nature Scientific Data publication
  • Heatmap with distribution of pathways good starting point (something similar to KEGG atlas ideal though)
  • Want to be able to do things like compare samples in Indian Ocian to x Ocean
  • Pathways by location heatmap
  • Metaviriome - attracted to certain pathways - want to visualize the pathways that are affected

MetaCyc Notes

  • Reference database of enzymes and metabolic pathways
  • Mostly small molecule pathways (but updated versions add macromolecular metabolic pathways)
  • Tool PathoLogic uses to predict metabolic networks of organism with annotated genome files - generates Pathway/Genome databases - BioCyc stores the databases generated by SRI
  • Used to generate organism-specific pathway/genome databases
  • Curated from experimentally validated results/academic papers

Project Update

Approximate Task Divvy Up

  • Analysis - Javier, Heather, Jasmine
  • Map - Ogan, Dan, Olga, Arjun, Kristen
  • Reading/Writing - Kristen, Arjun
  • Integration - Arjun

http://www.gutcyc.org - similar initiative - gut microbiome

Website Goals

  • Map
    • Data points are plotted to the map with latitude and longitude values
    • Want to be able to query by location, depth, other metadata (temperature, salinity, etc)
    • Clicking on a sample should pull up data on sample information, pathway information, etc. (Want some figures to make data visual - likely by metabolic category) and link out to MetaCyc information
  • Analysis Functionality
    • Want differential comparison of metabolic pathway activation for samples with given set of characteristics
    • Want to have a way to filter out pathways that are generally present everywhere
  • Interactive KEGG Atlas-like Visualization (if time permits)

Website Progress

  • Elected to use shiny to build website
  • Set groundwork and set up the map using leaflet (package for maps)
  • Working on query module - if filtering, zooming, changing parameters, query returns subset of data
  • Began basic formatting of site
  • Current site functionality: The site has three pages - a Home/Welcome, Map, and Analysis page - accessible from the sidebar menu. The map appears on the map page and the points are plotted. When a given sample is selected, a field appears with some basic sample data. A Control panel has been added for the map, but it is not yet filled in or functional
@tsa87
Copy link

tsa87 commented Oct 12, 2018

RPKM: Reads Per Kilobase of transcript per Million mapped reads
SRF: Surface Water Layer
DCM: Deep Chlorophyll Maximum
MES: Mesopelagic
OMZ: Oxygen minimum zone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants