diff --git a/docs/source/Demultiplexing/Running_Pipeline.rst b/docs/source/Demultiplexing/Running_Pipeline.rst index d968902..9accedf 100644 --- a/docs/source/Demultiplexing/Running_Pipeline.rst +++ b/docs/source/Demultiplexing/Running_Pipeline.rst @@ -478,19 +478,19 @@ Let's make sure that is what will be run by doing a dry run: --report demultiplexing_report.html - This will generate an html report that includes figures and pipeline metrics called ``demultiplexing_report.html``. - The report generated for this testa dataset is available :download:`here <../_static/Demultiplexing_report.html>`. + This will generate an html report that includes figures and pipeline metrics called :download:`demultiplexing_report.html <../_static/demultiplexing_report.html>`. + The report generated for this testa dataset is available :download:`here <../_static/demultiplexing_report.html>`. Checking the Output -------------------- +=================== -Each of the figures generated by this pipeline are included in the ``demultiplexing_report.html`` and it can therefore be used to check the results of the pipeline. +Each of the figures generated by this pipeline are included in the :download:`demultiplexing_report.html <../_static/demultiplexing_report.html>` and it can therefore be used to check the results of the pipeline. These figures will be used for discussion with members of the sceQTL-Gen Consortium to identify appropriate filtering thresholds for your dataset. In addition, we include the locations of each of these files in your directories for each of the figures below. -#. First, we can check the assignments of individuals to the clusters identified by souporcell. Those results are available in the `Souporcell Genotype Correlations` folder in the ``demultiplexing_report.html`` and are located in each pool directory in the ``souporcell/genotype_correlations/`` directory. Take a look at the ``pearson_correlation.png`` which should have the pearson correlation between each genotypes from each cluster identified by souporcell and the genotypes of the individuals that were in that pool. Your figure should look similar to: +#. First, we can check the assignments of individuals to the clusters identified by souporcell. Those results are available in the `Souporcell Genotype Correlations` folder in the :download:`demultiplexing_report.html <../_static/demultiplexing_report.html>` and are located in each pool directory in the ``souporcell/genotype_correlations/`` directory. Take a look at the ``pearson_correlation.png`` which should have the pearson correlation between each genotypes from each cluster identified by souporcell and the genotypes of the individuals that were in that pool. Your figure should look similar to: .. figure:: https://user-images.githubusercontent.com/44268007/104514035-87bae400-5640-11eb-8edf-2fbb75be2c8b.png @@ -533,12 +533,12 @@ In addition, we include the locations of each of these files in your directories - This file will be used to substitute the souporcell cluster IDs with the individual IDs -#. Next, let's see how many cells were classified as "singlet" and the number of individuals that we were able to detect. You will find a figure (``expected_observed_individuals_classifications.png``) with two barplots demonstrating these metrics across all the pools in the ``Number Individuals Summary`` folder in the ``demultiplexing_report.html`` and in the ``QC_figures`` directory locally: +#. Next, let's see how many cells were classified as "singlet" and the number of individuals that we were able to detect. You will find a figure (``expected_observed_individuals_classifications.png``) with two barplots demonstrating these metrics across all the pools in the ``Number Individuals Summary`` folder in the :download:`demultiplexing_report.html <../_static/demultiplexing_report.html>` and in the ``QC_figures`` directory locally: .. figure:: ../_static/expected_observed_individuals_classifications.png :width: 200 -#. In addition, there is another barplot figure that demonstrates the nubmer of droplets assigned to each individual and how many were classified as "doublets" or "unassigned". You will find a barplot of this data(``DropletType_Assignment_BarPlot.png``) in ``Number Individuals Summary`` folder in the ``demultiplexing_report.html`` and in the ``CombinedResults`` folder locally in each Pool. These are the final assignments for each droplet after intersecting the results from all of the softwares. +#. In addition, there is another barplot figure that demonstrates the nubmer of droplets assigned to each individual and how many were classified as "doublets" or "unassigned". You will find a barplot of this data(``DropletType_Assignment_BarPlot.png``) in ``Number Individuals Summary`` folder in the :download:`demultiplexing_report.html <../_static/demultiplexing_report.html>` and in the ``CombinedResults`` folder locally in each Pool. These are the final assignments for each droplet after intersecting the results from all of the softwares. .. figure:: https://user-images.githubusercontent.com/44268007/104514687-92c24400-5641-11eb-9c52-8771006d9f6f.png :width: 700 @@ -556,7 +556,7 @@ In addition, we include the locations of each of these files in your directories - If you find that the results in this figure are unanticipated (ie you have far more or far fewer singlets or doublets than expected), that would be a really good indication that either there is something strange about this pool (ie most droplets didn't contain cells) or that one or more of the softwares need to be rerun with different parameters. You can reach out to us by opening an issue if you find that this is the case and we can troubleshoot with you. -#. Now let's check the contents of the QC figures. A number of QC metrics have been plotted and are in the ``QC`` folder in the ``demultiplexing_report.html`` and saved in the ``QC_figures`` directory locally. As you can see, there are a number of files and figures have been generated. The single cell counts have been stored in a Seurat object and saved at various stages of processing: +#. Now let's check the contents of the QC figures. A number of QC metrics have been plotted and are in the ``QC`` folder in the :download:`demultiplexing_report.html <../_static/demultiplexing_report.html>` and saved in the ``QC_figures`` directory locally. As you can see, there are a number of files and figures have been generated. The single cell counts have been stored in a Seurat object and saved at various stages of processing: .. admonition:: seurat_object_all_pools_all_barcodes_all_metadata.rds @@ -645,6 +645,22 @@ In addition, we include the locations of each of these files in your directories +Uploading Data +=============== + +Upon completing the Demultiplexing and Doublet Removal pipeline, please upload your ``seurat_object_all_pools_all_barcodes_all_metadata.rds`` and ``demultiplexing_report.html`` to the shared own cloud. +This will be the same link you used to upload your data at the end of the SNP Imputation pipeline. +However, if you have not already organized a link for data upload, contact Marc Jan Bonder at bondermj @ gmail.com to get a link to upload the ``seurat_object_all_pools_all_barcodes_all_metadata.rds`` and ``demultiplexing_report.html``. +Be sure to include your dataset name as well as the PI name associated to the dataset. +This link will also be used for data upload WG2 results. + +.. admonition:: Important + :class: caution + + Please note you can't change filenames after uploading! + + + Next Steps ------------ diff --git a/docs/source/Imputation/Imputation_Required_Input.rst b/docs/source/Imputation/Imputation_Required_Input.rst index d8843ab..90f2226 100644 --- a/docs/source/Imputation/Imputation_Required_Input.rst +++ b/docs/source/Imputation/Imputation_Required_Input.rst @@ -177,6 +177,12 @@ Key for column contents: - Any additional metadata can be added as additional columns +.. admonition:: Important + :class: caution + + The ``data.psam`` file will be used to generate a per-individual meta-data file for use in WG3 (eQTL detection) and will be uploaded to a shared own cloud. + As such, it is important that you carefully consider whether any individual IDs need to be anonymized. + Next Steps diff --git a/docs/source/Imputation/SNP_Genotype_Imputation.rst b/docs/source/Imputation/SNP_Genotype_Imputation.rst index 500434b..94a2653 100644 --- a/docs/source/Imputation/SNP_Genotype_Imputation.rst +++ b/docs/source/Imputation/SNP_Genotype_Imputation.rst @@ -405,7 +405,7 @@ You should have the following results directories: You will also have an html report that includes figures and pipeline metrics called ``imputation_report.html``. -The report generated for this testa dataset is available :download:`here <../_static/imputation_report.html>`. +The report generated for this test dataset is available :download:`here <../_static/imputation_report.html>`. This report will have three main figure subsets: @@ -433,6 +433,18 @@ This report will have three main figure subsets: +Uploading Data +--------------- + +Upon completing the SNP Imputation pipeline, please contact Marc Jan Bonder at bondermj @ gmail.com to get a link to upload the ``imputation_report.html`` and the ``genotype_donor_annotation.tsv`` +Be sure to include your dataset name as well as the PI name associated to the dataset. +This link will also be used for data upload at the end of the demultiplexing and doublet removal pipeline, QC images and WG2 results. + +.. admonition:: Important + :class: caution + + Please note you can't change filenames after uploading! + Next Steps diff --git a/docs/source/_static/demultiplexing_report.html b/docs/source/_static/demultiplexing_report.html new file mode 100644 index 0000000..ccabc84 --- /dev/null +++ b/docs/source/_static/demultiplexing_report.html @@ -0,0 +1,3127 @@ + + + + + + + + + Snakemake Report + + + + + + + + + + + + + +
+

Loading Snakemake Report...

+

Please enable Javascript in your browser to see this report.

+

Loading 7.7 MB. For large reports, this can take a while.

+
+ + + +
+
+ + +
+
+

Workflow

+ + + +
+
+
+
+
+ +

Click the nodes to obtain details about each step.

+ +
+ +
+

DoubletDetection

+

test_dataset

+
+
+
+

Number Individuals Summary

+
+
+
+

QC

+
+
+
+

Scrublet

+

test_dataset

+
+
+
+

Souporcell Genotype Correlations

+

test_dataset

+
+
+ +
+

Statistics

+ If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue. +
+
+
+
+
+
+
+
+
+ + +
+

Loading...

+
+
+ + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/source/_static/imputation_report.html b/docs/source/_static/imputation_report.html new file mode 100644 index 0000000..93ec162 --- /dev/null +++ b/docs/source/_static/imputation_report.html @@ -0,0 +1,3514 @@ + + + + + + + + + Snakemake Report + + + + + + + + + + + + + +
+

Loading Snakemake Report...

+

Please enable Javascript in your browser to see this report.

+

Loading 349.6 kB. For large reports, this can take a while.

+
+ + + +
+
+ + +
+
+

Workflow

+ + + +
+
+
+
+
+ +

Click the nodes to obtain details about each step.

+ +
+ +
+

Ancestry

+
+
+
+

Ancestry and Sex Summary

+
+
+
+

SNP Numbers

+
+
+ +
+

Statistics

+ If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue. +
+
+
+
+
+
+
+
+
+ + +
+

Loading...

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 8b90339..da1ba9a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -13,7 +13,15 @@ Welcome to the documentation for WG1 of the sceQTL-Gen Consortium! .. figure:: https://user-images.githubusercontent.com/44268007/89252548-35b96f80-d659-11ea-97e9-4b4176df5f08.png :width: 300 -The purpose of this repository is to provide references and instructions for preparation of data for the sceQTL-Gen Consortium. Please note that you can run this pipeline in parallel to the `Working Group 2 (Cell Classification) `__ pipeline both of which will be used for `Working Group 3 (eQTL Detection) `__. +The purpose of this repository is to provide references and instructions for preparation of data for the sceQTL-Gen Consortium. +Please note that you can run this pipeline in parallel to the `Working Group 2 (Cell Classification) `__ pipeline both of which will be used for `Working Group 3 (eQTL Detection) `__. +Upon completion of the WG1 pipelines, please contact Drew Neavin (d.neavin @ garvan.org.au) so a meeting can be set up to discuss the best QC thresholds for the dataset that are consistent with thresholds for other datasets in the consortium. + +We ask that you upload the results from each of these pipelines (except the SNP genotype data) when completed to a shared own cloud. +To get a link to upload the data please email Marc Jan Bonder at bondermj @ gmail.com and provide the dataset name as well as the PI name associated to the dataset. +The link will be the same for the WG2 data upload. +Please note you can't change filenames after uploading! + There are four major steps that this group is addressing with data preprocessing: