Project 2: A reproducible template workflow for single-cell DNA methylation data #2

abaghela · 2017-08-02T19:15:28Z

A reproducible template workflow for single-cell DNA methylation data

DNA methylation is a heritable epigenetic mark that shows a strong correlation with transcriptional activity, and may be detected by whole genome bisulfite sequencing (WGBS). Recently, WGBS has been performed successfully on single cells (SC-WGBS). The resulting data represents a fundamental shift in the capacity to measure and interpret DNA methylation, especially in rare cell types and contexts where subtle cell-to-cell heterogeneity is crucial, such as in stem cells or cancer. However, although some software tools have been published, and several existing studies have tended to use similar methods, no standardized pipeline for the analysis of SC-WGBS yet exists. Simultaneously, there has been a drive within bioinformatics towards improved reproducibility. Recreating the exact results of a study requires not only the exact code, but also the exact software. Common Workflow Language (CWL) provides a framework for specifying complete workflows, while Docker allows for bundling of the exact software and auxiliary data used in an analysis within a container that can be executed anywhere. Together, these have the potential to enable completely reproducible bioinformatics research. At a previous Hackathon, the first steps were taken towards developing Screw, a collection of standard tools and workflows for analysing SC-WGBS data, wrapped in CWL and Docker. https://github.com/Epigenomics-Screw/Screw Screw will include quality control visualization, clustering and visualisation of cells by pairwise dissimilarity measures, construction of recapitulated-bulk methylomes from single cells of the same lineage, generation of bigWig methylation tracks for downstream visualization, and wrappers around published tools such as DeepCpG and LOLA. This project will focus on completing Screw, while also building standardised workflows to analyse a series of public SC-WGBS data sets. This will both provide a complete resource for reproducible SC-WGBS analysis, as well as a first metanalysis of SC-WGBS data.

Team Lead: Kieran O'Neill | [email protected] | @oneillkza | Postdoctoral Fellow | BC Genome Sciences Centre

oneillkza · 2017-09-12T19:28:18Z

So ... software: we need Docker. As far as I can see, ORCA already works by loading a Docker container. It sounds like running Docker inside Docker is possible, but not recommended. Could we get some comment from the ORCA admins on the best way to be navigating this? Eg if we could deploy our own containers directly, or if ORCA supports Common Workflow Language.

The hacky, roundabout, defeating the whole purpose of the project solution would be to run without Docker, and ensure that the ORCA container has everything from our existing container, but it's also likely that we'll be updating what software we need as we go during the hackathon.

Besides that, we'd need:

cwltool
Arvados -- less crucial, but would be good to have for testing cross-compatibility

lchong · 2017-09-19T23:04:41Z

@sjackman Can you comment on this? Would it be possible to load a different Docker image for Kieran's team when they log onto the ORCA machines?

sjackman · 2017-09-19T23:13:41Z

Hi, Kieran. cc @tmozgach

Yes, ORCA supports Common Workflow Language (CWL). It has cwltool installed. It'd be good to test it out to ensure that it works for your purpose. It does not have Arvados installed.

and ensure that the ORCA container has everything from our existing container

Here's the list of software installed on ORCA: https://github.com/bcgsc/orca/blob/master/versions.tsv
Can you check whether any software is missing?

It sounds like running Docker inside Docker is possible

We'll have to discuss this and get back to you.

sjackman · 2017-09-19T23:18:10Z

@oneillkza Do you run the CWL pipeline inside a Docker container, or does your CWL pipeline launch Docker containers?

oneillkza · 2017-09-19T23:22:27Z

@sjackman it launches containers. (This is basically the default cwltool behaviour.)

In our case, it's actually one container for all of the CWL tools, hence my saying we could bundle things up in the standard ORCA container. One tricky issue is that we also bundle up the Screw codebase inside the container, so as we hack on it, we'd need to constantly update the container.

sjackman · 2017-09-19T23:31:02Z

As a first pass, would try running your pipeline using cwltool inside the bcgsc/orca container, and configure cwltool not to launch any containers?

sjackman · 2017-09-19T23:31:37Z

We haven't created the ORCA accounts yet for Hackseq, but we can create yours first if you'd like to give that a go.

oneillkza · 2017-09-20T04:47:35Z

Yeah, that'd be a reasonable solution -- it's easy enough to use the --no-container flag in cwltool. We can test the Docker functionality on our local machines on toy examples, and run the pipeline in anger on ORCA but using --no-container.

Re: list of software, most of this is described in the following Dockerfiles. If you could add these to the ORCA Dockerfile, that should do it!

https://github.com/Epigenomics-Screw/Screw/blob/master/docker/base/Dockerfile
https://github.com/Epigenomics-Screw/Screw/blob/master/docker/screw/Dockerfile

Thanks!

(And yes please to getting an ORCA account for pre-testing.)

sjackman · 2017-09-20T21:19:51Z

Great. I've asked Brendan to create an ORCA account for you. In the mean time, you can test out the ORCA Docker image on your own hardware if you like: https://hub.docker.com/r/bcgsc/orca/
docker run -it bcgsc/orca. Note that it's a very large image, many gigs.

sjackman · 2017-09-20T21:23:09Z

R is installed, but the R packages are not pre-installed. You'll have to do that yourself.
@tmozgach Please add methpipe to the ORCA image.

tmozgach · 2017-09-25T17:22:19Z

@sjackman
Should the following software be in ORCA image for hackseq?

Install nano, vim, and emacs, man-db, methpipe

sjackman · 2017-09-25T17:43:53Z

Yes, please. Thanks, Tanya.
Please also brew install less if the command less is not already in the PATH.
And bzip2 and xz if they're not already in the PATH.

tmozgach · 2017-09-25T17:52:54Z

@sjackman I will add and start to build a new image 16th of September. By this time, is that possible to ask leaders what exactly they need in terms of software or think what should we add else?

sjackman · 2017-09-25T18:16:00Z

The above are all installed.

$ which less bzip2 gzip xz
/usr/bin/less
/home/linuxbrew/.linuxbrew/bin/bzip2
/bin/gzip
/home/linuxbrew/.linuxbrew/bin/xz

sjackman · 2017-09-25T18:16:41Z

This issue is for Project 2. Could you please post in each of the other project issues pointing each team leader to the list of installed software, and asking if they need any software missing from that list?

lchong · 2017-09-25T18:22:54Z

Hi @tmozgach @sjackman

I've already asked all the team leaders to post a list of required software in their respective project issues. But I'll also start a new issue summarizing people's requests so that it's all centralized, and I'll also remind them to give feedback (not everyone has done so yet).

sjackman · 2017-09-25T18:43:02Z

Thanks, Lauren!

jakelever · 2017-10-10T23:46:33Z

Hey team lead ( @oneillkza ) , we've been gathering Github IDs for your team members. From your description, it sounds like you plan to use the existing Screw repo for this project. If that's the case, could you please add the people below as collaborators to that project? Or if you'd prefer, we can make a repo in the hackseq organisation and sort out membership for you.

cmorganl
klimstef
sibylgisela
jesszha
jjonphl
adammendoza

Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

We'll forward on any remaining Github IDs through this issue.

Thanks, Jake
obo the Hackseq organising committee

lchong mentioned this issue Sep 25, 2017

ORCA setup hackseq/hackseq_2017#33

Closed

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project 2: A reproducible template workflow for single-cell DNA methylation data #2

Project 2: A reproducible template workflow for single-cell DNA methylation data #2

abaghela commented Aug 2, 2017 •

edited by lchong

Loading

oneillkza commented Sep 12, 2017

lchong commented Sep 19, 2017

sjackman commented Sep 19, 2017 •

edited

Loading

sjackman commented Sep 19, 2017

oneillkza commented Sep 19, 2017

sjackman commented Sep 19, 2017

sjackman commented Sep 19, 2017

oneillkza commented Sep 20, 2017

sjackman commented Sep 20, 2017

sjackman commented Sep 20, 2017

tmozgach commented Sep 25, 2017 •

edited by sjackman

Loading

sjackman commented Sep 25, 2017 •

edited

Loading

tmozgach commented Sep 25, 2017

sjackman commented Sep 25, 2017

sjackman commented Sep 25, 2017 •

edited

Loading

lchong commented Sep 25, 2017

sjackman commented Sep 25, 2017

jakelever commented Oct 10, 2017 •

edited

Loading

Project 2: A reproducible template workflow for single-cell DNA methylation data #2

Project 2: A reproducible template workflow for single-cell DNA methylation data #2

Comments

abaghela commented Aug 2, 2017 • edited by lchong Loading

oneillkza commented Sep 12, 2017

lchong commented Sep 19, 2017

sjackman commented Sep 19, 2017 • edited Loading

sjackman commented Sep 19, 2017

oneillkza commented Sep 19, 2017

sjackman commented Sep 19, 2017

sjackman commented Sep 19, 2017

oneillkza commented Sep 20, 2017

sjackman commented Sep 20, 2017

sjackman commented Sep 20, 2017

tmozgach commented Sep 25, 2017 • edited by sjackman Loading

sjackman commented Sep 25, 2017 • edited Loading

tmozgach commented Sep 25, 2017

sjackman commented Sep 25, 2017

sjackman commented Sep 25, 2017 • edited Loading

lchong commented Sep 25, 2017

sjackman commented Sep 25, 2017

jakelever commented Oct 10, 2017 • edited Loading

abaghela commented Aug 2, 2017 •

edited by lchong

Loading

sjackman commented Sep 19, 2017 •

edited

Loading

tmozgach commented Sep 25, 2017 •

edited by sjackman

Loading

sjackman commented Sep 25, 2017 •

edited

Loading

sjackman commented Sep 25, 2017 •

edited

Loading

jakelever commented Oct 10, 2017 •

edited

Loading