Skip to content

Commit

Permalink
edit phase walkthrough
Browse files Browse the repository at this point in the history
  • Loading branch information
golobor committed Mar 28, 2024
1 parent 802f69c commit 6c9ca65
Showing 1 changed file with 12 additions and 31 deletions.
43 changes: 12 additions & 31 deletions doc/examples/pairtools_phase_walkthrough.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,50 +34,31 @@
"source": [
"Several approaches have been developed to process Hi-C data from haplotype-resolved experiments. In `pairtools`, we implement the approach that was used in Erceg et al. Here is its brief outline:\n",
"\n",
"1. [Create the reference genome](#Create-the-reference-genome): create a \"concatenated\" reference genome that contains sequences of both homologs of each chromosome. \n",
"1. Create the haplotype-resolved genome. First, we will create a \"concatenated\" reference genome that contains sequences of both homologs of each chromosome. \n",
"\n",
" - Incorporate known SNVs (usually in .vcf format) into the reference genome using [bcftools](https://samtools.github.io/bcftools/bcftools.html) to create FASTA files with the sequences of both homologs.\n",
" - Add suffixes to the name of each homolog that identify the type (`_hap1` or `_hap2`).\n",
"\n",
"2. Map the Hi-C data to the concatenated reference and parse resulting alignment into Hi-C pairs. Compared to the standard Hi-C pipeline, this step would contain a couple of modifications:\n",
" - parse allowing multimappers (mapq 0). \n",
" - make the aligner report two suboptimal alignments (aka the second and the third hit).\n",
" - Make the aligner report two suboptimal alignments (aka the second and the third hit).\n",
" - Parse allowing multimappers (mapq 0). \n",
" \n",
" Note that, upon mapping to the homolog-resolved genome, Hi-C reads will report the identity of their homologue as the suffix of the chromosome name.\n",
" \n",
" See sections:\n",
" \n",
" (i) [Download data](#Download-data)\n",
" \n",
" (ii) [Map data with bwa mem to diploid genome](#Map-data-with-bwa-mem-to-diploid-genome)\n",
" \n",
" (iii) [pairtools parse](#pairtools-parse)\n",
" \n",
"3. Phase the resulting pairs based on the reported suboptimal alignments. \n",
"\n",
"3. [pairtools phase](#pairtools-phase): phase the pairs based on the reported suboptimal alignments. \n",
" By checking the scores of two suboptimal alignments, we will distinguish the true multi-mappers from unresolved pairs (i.e. cases when the read aligns to the location with no distinguishing SNV). Phasing will remove the haplotype suffixes from chromosome names and add extra fields to the .pairs file with:\n",
"\n",
" By checking the scores of two suboptimal alignments, we will distinguish the true multi-mappers from unresolved pairs (i.e. cases when the read aligns to the location with no distinguishing SNV).\n",
" Phasing procedure will remove the haplotype suffixes from chromosome names and add extra fields to the .pairs file with:\n",
" \n",
" '.' (non-resolved)\n",
" \n",
" '0' (first haplotype) or \n",
" \n",
" '1' (second haplotype). \n",
" \n",
" - '.' (non-resolved)\n",
" - '0' (first haplotype) \n",
" - '1' (second haplotype)\n",
" \n",
" Phasing schema: \n",
" \n",
"![image.png](attachment:62e74fba-c1c1-44b5-a3e2-3699c3cac7ce.png)\n",
"\n",
" ![image.png](attachment:62e74fba-c1c1-44b5-a3e2-3699c3cac7ce.png)\n",
"\n",
"4. Post-procesing. Sort and dedup Hi-C pairs and calculate stats, similarly to the standard Hi-C pipeline. \n",
"\n",
" See sections:\n",
" \n",
" (i) [pairtools dedup](#pairtools-dedup)\n",
" \n",
" (ii) [Stats](#Stats)"
"4. Post-procesing. Sort and [dedup](#pairtools-dedup) Hi-C pairs and calculate [stats](#Stats), similarly to the standard Hi-C pipeline. "
]
},
{
Expand Down Expand Up @@ -361,15 +342,15 @@
"id": "dfd7c4cb-31dd-43df-8510-95fd0ff9f78f",
"metadata": {},
"source": [
"#### Create the index of concatenated haplotypes"
"#### Create the bwa index of homolog-resolved genome"
]
},
{
"cell_type": "markdown",
"id": "99d28f6f-b754-4a95-95d5-9e5e51d14571",
"metadata": {},
"source": [
"Concatenate the genomes and index them together. "
"Concatenate the genomes of two homologs and index them together. "
]
},
{
Expand Down

0 comments on commit 6c9ca65

Please sign in to comment.