California Conservation Genomics Project (CCGP) repository for the genome assembly working group.
This repository contains scripts used for the reference genome assembly efforts of the CCGP.
CCGP reference genomes are assembled following a protocol adapted from Rhie et al. (2021). Assemblies are comprised of PacBio HiFi long read data, which is scaffolded using proximity ligation/chromatin conformation capture (HiC or OmniC) (Dovetail Genomics). Our minimum target reference genome quality is 6.7.Q40, and in most cases we expect to reach 7.C.Q50 or better (see Table 1 in Rhie et al. (2021)).
Here the overview of our current pipeline:
There have been multiple versions since the beginning of the project and this is an overview of how the pipeline has evolved.
Color blocks:
- Yellow: sequencing datatypes
- Dark gray: Fixed processes
- Light gray: Optional processes
- Blue: Iterative step
- PacBio HiFi
- PacBio Adapter filtering
- K-mer counting with meryl
- Genome size, heterozygosity and repeat content estimation
- Coverage validation (calculation of expected coverage given the sequencing data
- HiC/OmniC
- Library QC with Dovetail Genomics tools
- Contig assembly with HiFiasm
- Depending on datasets available or ploidy, we are using single or HiC mode on HiFiasm.
- Alignment of HiFi data with minimap2 and purging with purge_dups
- Alignments with Arima Genomics Mapping Pipeline
- Scaffolding with SALSA
- Generation and visualization of contact maps
- HiGlass
- Generation of tracks
- HiFi coverage
- HiC/OmniC coverage
- Genome assembly mappability
- Gap description
- PretextSuite
- Using YAGCloser - based on gap spanning of long reads
- Mitogenome assembly pipeline or MitoHiFi
- Organelle filtering from nuclear assemblies
- Contamination screening with Blobtools
- Contiguity metrics (contig and scaffold N50)
- BUSCO scores
- per base quality / k-mer completeness
- Frameshift errors
- Gap description
- Genome mappability
- Mapping quality
- For further information about our project and efforts please redirect to the CCGP website
- For more information about the project, you can also check this: