-
Notifications
You must be signed in to change notification settings - Fork 2
Extra hybrid assembly
Combining long reads from Oxford Nanopore and the shorter but high-quality reads from Illumina we might get very good results. This is a very well established procedure for bacterial genomes, and now - with the latest flowcells from ONT - even "Nanopore only" assemblies are not too expensive.
With phages, it's relatively less explored for several reasons: their very small genome size requires a very low coverage and it can be just easier to stick to short reads: they can (as we saw on our Day1) assemble the whole genome perfectly.
A pipeline that can combine both long reads an short reads is Unicycler.
Let's try it:
# Remember to activate the appropriate conda environment!
unicycler -1 /data/reads/illumina/T4_R1.fastq.gz -2 /data/reads/illumina/T4_R2.fastq.gz \
-l /data/reads/ont/T4-ONT.fastq.gz -o T4-hybrid -t 8
We supply both the two paired end Illumina files (with -1
and -2
respectively) and the long reads (with -l
).
The output will be saved in a new directory we specify via -o
.
ls -l T4-hybrid/
total 1164
-rw-rw-r-- 1 ubuntu ubuntu 166770 Nov 18 11:27 001_best_spades_graph.gfa
-rw-rw-r-- 1 ubuntu ubuntu 166069 Nov 18 11:27 002_overlaps_removed.gfa
-rw-rw-r-- 1 ubuntu ubuntu 166051 Nov 18 11:27 003_bridges_applied.gfa
-rw-rw-r-- 1 ubuntu ubuntu 165862 Nov 18 11:27 004_final_clean.gfa
-rw-rw-r-- 1 ubuntu ubuntu 165862 Nov 18 11:30 005_polished.gfa
-rw-rw-r-- 1 ubuntu ubuntu 168235 Nov 18 11:30 assembly.fasta
-rw-rw-r-- 1 ubuntu ubuntu 165862 Nov 18 11:30 assembly.gfa
-rw-rw-r-- 1 ubuntu ubuntu 11567 Nov 18 11:30 unicycler.lo
Let's check
seqfu stats -n -b T4-hybrid/assembly.fasta
┌──────────┬──────┬──────────┬──────────┬────────┬────────┬────────┬────────────┬────────┬────────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
├──────────┼──────┼──────────┼──────────┼────────┼────────┼────────┼────────────┼────────┼────────┤
│ assembly │ 1 │ 165823 │ 165823.0 │ 165823 │ 165823 │ 165823 │ 165823.000 │ 165823 │ 165823 │
└──────────┴──────┴──────────┴──────────┴────────┴────────┴────────┴────────────┴────────┴────────┘
We can paste the two assemblies in Blast using the "Blast two sequences" options
And yes, the result is a perfect match (except the starting point, of course).
Phage Annotation Workshop 2021