Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Analysis Example: Microarray Pathway Analysis - GSVA #343

Closed
cansavvy opened this issue Oct 29, 2020 · 2 comments · Fixed by #362
Closed

New Analysis Example: Microarray Pathway Analysis - GSVA #343

cansavvy opened this issue Oct 29, 2020 · 2 comments · Fixed by #362
Assignees
Labels
before going live Needs to be done before we can "go live" or do testing

Comments

@cansavvy
Copy link
Contributor

cansavvy commented Oct 29, 2020

What are the goals of this new example analysis?

ORA and GSEA are certainly popular pathway analyses methods, but GSVA requires a bit less cutoffs and decision making so having this method as an example would probably be helpful for our users.

Having a per sample pathway analysis results is a different question that GSVA can answer but the others can't so much.

What kind of dataset will this need?

We may want to use the same. original dataset we used in either GSEA or ORA so we have a comparison of pathway analyses?: GSE71270 (zebrafish CREB study) or GSE37418 (human medulloblastoma subtype).

What steps should be included in this analysis?

We can borrow some inspiration from https://github.com/AlexsLemonade/training-modules/blob/master/pathway-analysis/03-gene_set_variation_analysis.Rmd, keeping in mind that the narrative will need to change somewhat like other examples we've adapted from training to refinebio-examples: See #306

  1. Import library(GSVA) (add this to the Dockerfile)
  2. Set up gene expression data as a matrix that that
  3. Import gene lists and decide about Hallmark or not (this decision should be made considering the discussion happening on WIP: Add Microarray Pathway Analysis - GSEA example #339 (comment) -- we'll want to. make sure users understand the implications of multiple testing corrections and how smaller gene sets can help with this.
  4. Use GSVA::gsva() to perform GSVA, probably start out with largely the same parameters used in training but adjust if/when things look wonky.
  5. Display a preview of significant results in one way or another. Somewhat related to this discussion WIP: Add Microarray Pathway Analysis - GSEA example #339 (comment)
  6. Make some sort of visualization of the GSVA scores. Not sure what makes the most sense here? Plotting the top results and maybe a jitter plot by group?
  7. Write results to a TSV.

What packages/methods do you recommend using or looking into for this analysis?

Probably GSVA unless there are other package suggestions we should consider.

@cbethell
Copy link
Contributor

cbethell commented Nov 9, 2020

Based on a discussion with @cansavvy, the plan in the original comment above, and the training modules example for inspiration, the tentative plan for tackling this ticket is as follows:

  1. Import library(GSVA) (add this to the Dockerfile)
  2. Read in gene expression data (Homo sapiens, likely a dataset already on S3)
  3. Import gene list from broad institute url using recommendation from GSVA vignette to read in file (and isolate hallmark gene sets) — include context making sure users understand the implications of multiple testing corrections and how smaller gene sets can help with this (if we were to read in a smaller subset file)
  4. Gene identifier conversion — map to human gene symbols or entrez ids, likely symbols
  5. Remove duplicate identifiers — using the highest variance to select which row to keep perhaps?
  6. Use GSVA::gsva() to perform GSVA, probably start out with largely the same parameters used in training but adjust if/when things look wonky.
  7. Make some sort of visualization of the GSVA scores. Plotting the results using a heatmap and maybe a violin or jitter plot to plot by group? To plot by highest variance? To plot by highest GSVA score?
  8. Write results to a TSV.

Feel free to leave any suggestions/modifications you believe should be made before implementing this plan!
cc: @jaclyn-taroni and @jashapiro

@jaclyn-taroni
Copy link
Member

Remove duplicate identifiers — using the highest variance to select which row to keep perhaps?

You could also aggregate to the mean value for a gene symbol for each sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
before going live Needs to be done before we can "go live" or do testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants