-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project 2: A reproducible template workflow for single-cell DNA methylation data #2
Comments
So ... software: we need Docker. As far as I can see, ORCA already works by loading a Docker container. It sounds like running Docker inside Docker is possible, but not recommended. Could we get some comment from the ORCA admins on the best way to be navigating this? Eg if we could deploy our own containers directly, or if ORCA supports Common Workflow Language. The hacky, roundabout, defeating the whole purpose of the project solution would be to run without Docker, and ensure that the ORCA container has everything from our existing container, but it's also likely that we'll be updating what software we need as we go during the hackathon. Besides that, we'd need: |
@sjackman Can you comment on this? Would it be possible to load a different Docker image for Kieran's team when they log onto the ORCA machines? |
Hi, Kieran. cc @tmozgach Yes, ORCA supports Common Workflow Language (CWL). It has
Here's the list of software installed on ORCA: https://github.com/bcgsc/orca/blob/master/versions.tsv
We'll have to discuss this and get back to you. |
@oneillkza Do you run the CWL pipeline inside a Docker container, or does your CWL pipeline launch Docker containers? |
@sjackman it launches containers. (This is basically the default cwltool behaviour.) In our case, it's actually one container for all of the CWL tools, hence my saying we could bundle things up in the standard ORCA container. One tricky issue is that we also bundle up the Screw codebase inside the container, so as we hack on it, we'd need to constantly update the container. |
As a first pass, would try running your pipeline using |
We haven't created the ORCA accounts yet for Hackseq, but we can create yours first if you'd like to give that a go. |
Yeah, that'd be a reasonable solution -- it's easy enough to use the --no-container flag in cwltool. We can test the Docker functionality on our local machines on toy examples, and run the pipeline in anger on ORCA but using --no-container. Re: list of software, most of this is described in the following Dockerfiles. If you could add these to the ORCA Dockerfile, that should do it! https://github.com/Epigenomics-Screw/Screw/blob/master/docker/base/Dockerfile Thanks! (And yes please to getting an ORCA account for pre-testing.) |
Great. I've asked Brendan to create an ORCA account for you. In the mean time, you can test out the ORCA Docker image on your own hardware if you like: https://hub.docker.com/r/bcgsc/orca/ |
R is installed, but the R packages are not pre-installed. You'll have to do that yourself. |
@sjackman
|
Yes, please. Thanks, Tanya. |
@sjackman I will add and start to build a new image 16th of September. By this time, is that possible to ask leaders what exactly they need in terms of software or think what should we add else? |
The above are all installed.
|
This issue is for Project 2. Could you please post in each of the other project issues pointing each team leader to the list of installed software, and asking if they need any software missing from that list? |
I've already asked all the team leaders to post a list of required software in their respective project issues. But I'll also start a new issue summarizing people's requests so that it's all centralized, and I'll also remind them to give feedback (not everyone has done so yet). |
Thanks, Lauren! |
Hey team lead ( @oneillkza ) , we've been gathering Github IDs for your team members. From your description, it sounds like you plan to use the existing Screw repo for this project. If that's the case, could you please add the people below as collaborators to that project? Or if you'd prefer, we can make a repo in the hackseq organisation and sort out membership for you. cmorganl Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate. We'll forward on any remaining Github IDs through this issue. Thanks, Jake |
A reproducible template workflow for single-cell DNA methylation data
DNA methylation is a heritable epigenetic mark that shows a strong correlation with transcriptional activity, and may be detected by whole genome bisulfite sequencing (WGBS). Recently, WGBS has been performed successfully on single cells (SC-WGBS). The resulting data represents a fundamental shift in the capacity to measure and interpret DNA methylation, especially in rare cell types and contexts where subtle cell-to-cell heterogeneity is crucial, such as in stem cells or cancer. However, although some software tools have been published, and several existing studies have tended to use similar methods, no standardized pipeline for the analysis of SC-WGBS yet exists. Simultaneously, there has been a drive within bioinformatics towards improved reproducibility. Recreating the exact results of a study requires not only the exact code, but also the exact software. Common Workflow Language (CWL) provides a framework for specifying complete workflows, while Docker allows for bundling of the exact software and auxiliary data used in an analysis within a container that can be executed anywhere. Together, these have the potential to enable completely reproducible bioinformatics research. At a previous Hackathon, the first steps were taken towards developing Screw, a collection of standard tools and workflows for analysing SC-WGBS data, wrapped in CWL and Docker. https://github.com/Epigenomics-Screw/Screw Screw will include quality control visualization, clustering and visualisation of cells by pairwise dissimilarity measures, construction of recapitulated-bulk methylomes from single cells of the same lineage, generation of bigWig methylation tracks for downstream visualization, and wrappers around published tools such as DeepCpG and LOLA. This project will focus on completing Screw, while also building standardised workflows to analyse a series of public SC-WGBS data sets. This will both provide a complete resource for reproducible SC-WGBS analysis, as well as a first metanalysis of SC-WGBS data.
Team Lead: Kieran O'Neill | [email protected] | @oneillkza | Postdoctoral Fellow | BC Genome Sciences Centre
The text was updated successfully, but these errors were encountered: