Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 2: A reproducible template workflow for single-cell DNA methylation data #2

Open
abaghela opened this issue Aug 2, 2017 · 18 comments

Comments

@abaghela
Copy link
Contributor

abaghela commented Aug 2, 2017

A reproducible template workflow for single-cell DNA methylation data

DNA methylation is a heritable epigenetic mark that shows a strong correlation with transcriptional activity, and may be detected by whole genome bisulfite sequencing (WGBS). Recently, WGBS has been performed successfully on single cells (SC-WGBS). The resulting data represents a fundamental shift in the capacity to measure and interpret DNA methylation, especially in rare cell types and contexts where subtle cell-to-cell heterogeneity is crucial, such as in stem cells or cancer. However, although some software tools have been published, and several existing studies have tended to use similar methods, no standardized pipeline for the analysis of SC-WGBS yet exists. Simultaneously, there has been a drive within bioinformatics towards improved reproducibility. Recreating the exact results of a study requires not only the exact code, but also the exact software. Common Workflow Language (CWL) provides a framework for specifying complete workflows, while Docker allows for bundling of the exact software and auxiliary data used in an analysis within a container that can be executed anywhere. Together, these have the potential to enable completely reproducible bioinformatics research. At a previous Hackathon, the first steps were taken towards developing Screw, a collection of standard tools and workflows for analysing SC-WGBS data, wrapped in CWL and Docker. https://github.com/Epigenomics-Screw/Screw Screw will include quality control visualization, clustering and visualisation of cells by pairwise dissimilarity measures, construction of recapitulated-bulk methylomes from single cells of the same lineage, generation of bigWig methylation tracks for downstream visualization, and wrappers around published tools such as DeepCpG and LOLA. This project will focus on completing Screw, while also building standardised workflows to analyse a series of public SC-WGBS data sets. This will both provide a complete resource for reproducible SC-WGBS analysis, as well as a first metanalysis of SC-WGBS data.

Team Lead: Kieran O'Neill | [email protected] | @oneillkza | Postdoctoral Fellow | BC Genome Sciences Centre

@oneillkza
Copy link

So ... software: we need Docker. As far as I can see, ORCA already works by loading a Docker container. It sounds like running Docker inside Docker is possible, but not recommended. Could we get some comment from the ORCA admins on the best way to be navigating this? Eg if we could deploy our own containers directly, or if ORCA supports Common Workflow Language.

The hacky, roundabout, defeating the whole purpose of the project solution would be to run without Docker, and ensure that the ORCA container has everything from our existing container, but it's also likely that we'll be updating what software we need as we go during the hackathon.

Besides that, we'd need:

  • cwltool
  • Arvados -- less crucial, but would be good to have for testing cross-compatibility

@lchong
Copy link

lchong commented Sep 19, 2017

@sjackman Can you comment on this? Would it be possible to load a different Docker image for Kieran's team when they log onto the ORCA machines?

@sjackman
Copy link

sjackman commented Sep 19, 2017

Hi, Kieran. cc @tmozgach

Yes, ORCA supports Common Workflow Language (CWL). It has cwltool installed. It'd be good to test it out to ensure that it works for your purpose. It does not have Arvados installed.

and ensure that the ORCA container has everything from our existing container

Here's the list of software installed on ORCA: https://github.com/bcgsc/orca/blob/master/versions.tsv
Can you check whether any software is missing?

It sounds like running Docker inside Docker is possible

We'll have to discuss this and get back to you.

@sjackman
Copy link

@oneillkza Do you run the CWL pipeline inside a Docker container, or does your CWL pipeline launch Docker containers?

@oneillkza
Copy link

@sjackman it launches containers. (This is basically the default cwltool behaviour.)

In our case, it's actually one container for all of the CWL tools, hence my saying we could bundle things up in the standard ORCA container. One tricky issue is that we also bundle up the Screw codebase inside the container, so as we hack on it, we'd need to constantly update the container.

@sjackman
Copy link

As a first pass, would try running your pipeline using cwltool inside the bcgsc/orca container, and configure cwltool not to launch any containers?

@sjackman
Copy link

We haven't created the ORCA accounts yet for Hackseq, but we can create yours first if you'd like to give that a go.

@oneillkza
Copy link

Yeah, that'd be a reasonable solution -- it's easy enough to use the --no-container flag in cwltool. We can test the Docker functionality on our local machines on toy examples, and run the pipeline in anger on ORCA but using --no-container.

Re: list of software, most of this is described in the following Dockerfiles. If you could add these to the ORCA Dockerfile, that should do it!

https://github.com/Epigenomics-Screw/Screw/blob/master/docker/base/Dockerfile
https://github.com/Epigenomics-Screw/Screw/blob/master/docker/screw/Dockerfile

Thanks!

(And yes please to getting an ORCA account for pre-testing.)

@sjackman
Copy link

Great. I've asked Brendan to create an ORCA account for you. In the mean time, you can test out the ORCA Docker image on your own hardware if you like: https://hub.docker.com/r/bcgsc/orca/
docker run -it bcgsc/orca. Note that it's a very large image, many gigs.

@sjackman
Copy link

R is installed, but the R packages are not pre-installed. You'll have to do that yourself.
@tmozgach Please add methpipe to the ORCA image.

@tmozgach
Copy link

tmozgach commented Sep 25, 2017

@sjackman
Should the following software be in ORCA image for hackseq?

Install nano, vim, and emacs, man-db, methpipe 

@sjackman
Copy link

sjackman commented Sep 25, 2017

Yes, please. Thanks, Tanya.
Please also brew install less if the command less is not already in the PATH.
And bzip2 and xz if they're not already in the PATH.

@tmozgach
Copy link

@sjackman I will add and start to build a new image 16th of September. By this time, is that possible to ask leaders what exactly they need in terms of software or think what should we add else?

@sjackman
Copy link

The above are all installed.

$ which less bzip2 gzip xz
/usr/bin/less
/home/linuxbrew/.linuxbrew/bin/bzip2
/bin/gzip
/home/linuxbrew/.linuxbrew/bin/xz

@sjackman
Copy link

sjackman commented Sep 25, 2017

This issue is for Project 2. Could you please post in each of the other project issues pointing each team leader to the list of installed software, and asking if they need any software missing from that list?

@lchong
Copy link

lchong commented Sep 25, 2017

Hi @tmozgach @sjackman

I've already asked all the team leaders to post a list of required software in their respective project issues. But I'll also start a new issue summarizing people's requests so that it's all centralized, and I'll also remind them to give feedback (not everyone has done so yet).

@sjackman
Copy link

Thanks, Lauren!

@jakelever
Copy link

jakelever commented Oct 10, 2017

Hey team lead ( @oneillkza ) , we've been gathering Github IDs for your team members. From your description, it sounds like you plan to use the existing Screw repo for this project. If that's the case, could you please add the people below as collaborators to that project? Or if you'd prefer, we can make a repo in the hackseq organisation and sort out membership for you.

cmorganl
klimstef
sibylgisela
jesszha
jjonphl
adammendoza

Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

We'll forward on any remaining Github IDs through this issue.

Thanks, Jake
obo the Hackseq organising committee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants