From 6c9ca65231af830814ec21f9aca1519448041c2c Mon Sep 17 00:00:00 2001 From: Anton Goloborodko Date: Thu, 28 Mar 2024 12:05:19 +0100 Subject: [PATCH] edit phase walkthrough --- .../pairtools_phase_walkthrough.ipynb | 43 ++++++------------- 1 file changed, 12 insertions(+), 31 deletions(-) diff --git a/doc/examples/pairtools_phase_walkthrough.ipynb b/doc/examples/pairtools_phase_walkthrough.ipynb index 9905d5a..c1116fa 100644 --- a/doc/examples/pairtools_phase_walkthrough.ipynb +++ b/doc/examples/pairtools_phase_walkthrough.ipynb @@ -34,50 +34,31 @@ "source": [ "Several approaches have been developed to process Hi-C data from haplotype-resolved experiments. In `pairtools`, we implement the approach that was used in Erceg et al. Here is its brief outline:\n", "\n", - "1. [Create the reference genome](#Create-the-reference-genome): create a \"concatenated\" reference genome that contains sequences of both homologs of each chromosome. \n", + "1. Create the haplotype-resolved genome. First, we will create a \"concatenated\" reference genome that contains sequences of both homologs of each chromosome. \n", "\n", " - Incorporate known SNVs (usually in .vcf format) into the reference genome using [bcftools](https://samtools.github.io/bcftools/bcftools.html) to create FASTA files with the sequences of both homologs.\n", " - Add suffixes to the name of each homolog that identify the type (`_hap1` or `_hap2`).\n", "\n", "2. Map the Hi-C data to the concatenated reference and parse resulting alignment into Hi-C pairs. Compared to the standard Hi-C pipeline, this step would contain a couple of modifications:\n", - " - parse allowing multimappers (mapq 0). \n", - " - make the aligner report two suboptimal alignments (aka the second and the third hit).\n", + " - Make the aligner report two suboptimal alignments (aka the second and the third hit).\n", + " - Parse allowing multimappers (mapq 0). \n", " \n", " Note that, upon mapping to the homolog-resolved genome, Hi-C reads will report the identity of their homologue as the suffix of the chromosome name.\n", " \n", - " See sections:\n", - " \n", - " (i) [Download data](#Download-data)\n", - " \n", - " (ii) [Map data with bwa mem to diploid genome](#Map-data-with-bwa-mem-to-diploid-genome)\n", - " \n", - " (iii) [pairtools parse](#pairtools-parse)\n", - " \n", + "3. Phase the resulting pairs based on the reported suboptimal alignments. \n", "\n", - "3. [pairtools phase](#pairtools-phase): phase the pairs based on the reported suboptimal alignments. \n", + " By checking the scores of two suboptimal alignments, we will distinguish the true multi-mappers from unresolved pairs (i.e. cases when the read aligns to the location with no distinguishing SNV). Phasing will remove the haplotype suffixes from chromosome names and add extra fields to the .pairs file with:\n", "\n", - " By checking the scores of two suboptimal alignments, we will distinguish the true multi-mappers from unresolved pairs (i.e. cases when the read aligns to the location with no distinguishing SNV).\n", - " Phasing procedure will remove the haplotype suffixes from chromosome names and add extra fields to the .pairs file with:\n", - " \n", - " '.' (non-resolved)\n", - " \n", - " '0' (first haplotype) or \n", - " \n", - " '1' (second haplotype). \n", - " \n", + " - '.' (non-resolved)\n", + " - '0' (first haplotype) \n", + " - '1' (second haplotype)\n", " \n", " Phasing schema: \n", " \n", - "![image.png](attachment:62e74fba-c1c1-44b5-a3e2-3699c3cac7ce.png)\n", - "\n", + " ![image.png](attachment:62e74fba-c1c1-44b5-a3e2-3699c3cac7ce.png)\n", "\n", - "4. Post-procesing. Sort and dedup Hi-C pairs and calculate stats, similarly to the standard Hi-C pipeline. \n", "\n", - " See sections:\n", - " \n", - " (i) [pairtools dedup](#pairtools-dedup)\n", - " \n", - " (ii) [Stats](#Stats)" + "4. Post-procesing. Sort and [dedup](#pairtools-dedup) Hi-C pairs and calculate [stats](#Stats), similarly to the standard Hi-C pipeline. " ] }, { @@ -361,7 +342,7 @@ "id": "dfd7c4cb-31dd-43df-8510-95fd0ff9f78f", "metadata": {}, "source": [ - "#### Create the index of concatenated haplotypes" + "#### Create the bwa index of homolog-resolved genome" ] }, { @@ -369,7 +350,7 @@ "id": "99d28f6f-b754-4a95-95d5-9e5e51d14571", "metadata": {}, "source": [ - "Concatenate the genomes and index them together. " + "Concatenate the genomes of two homologs and index them together. " ] }, {