Skip to content

Latest commit

 

History

History
111 lines (99 loc) · 3.52 KB

yield_metrics.md

File metadata and controls

111 lines (99 loc) · 3.52 KB

Yield improvement versus ccs on various sequencing runs

We evaluate on 3 different datasets

For each PacBio dataset (Movie ID), we compared yield at Q30 for ccs (baseline), and v0.2, v0.3, v1.0, v1.1, v1.2 of DeepConsensus.

Movie ID Sample Chemistry Mean insert size
m64011_181218_235052 HG002 1 11 kb
m64008_201124_002822 HG002 2.2 15 kb
m64014_200920_132517 HG002 2.2 24 kb

Yield versus runtime

v1.2 runtime versus yield over ccs

version movie dataset num_reads_ccs num_reads yield@emQ20 yield@emQ20/ccs yield@emQ30 yield@emQ30/ccs yield@emQ40 yield@emQ40/ccs hours
v1.2 m64011_181218_235052 chem1_11kb 1,392,300 1,552,566 17.16 Gb 111.72% 12.17 Gb 137.81% 5.32 Gb 217.55% 219.39
v1.2 m64008_201124_002822 chem2.2_15kb 2,687,977 2,894,238 43.00 Gb 108.55% 33.06 Gb 129.70% 10.35 Gb 259.46% 532.03
v1.2 m64014_200920_132517 chem2.2_24kb 1,918,627 2,083,487 49.75 Gb 109.96% 32.92 Gb 196.82% 3.11 Gb 1203.8% 661.91

yield@emQ30/ccs or "Yield at empirical Q30 relative to CCS" is calculated as follows:

  1. Filter DeepConsensus output to predicted Q20.
  2. For each read, align it to the truth and calculate identity from that alignment: identity = # matches / (# matches + # mismatches + # insertions + # deletions).
  3. Take all the reads that have identity >= 0.999 (this is Q30).
  4. Because longer reads are more useful than shorter reads, we count the total bases and not just the number of reads.
  5. Next we repeat the above for the original CCS reads (run with default params = Q20 filtered) and subtract and divide them to get a percentage, e.g. 40% percent means that DeepConsensus increased yield of high quality reads in bases by 40% over CCS.

These were run on GCP n1-standard-16 machines with no GPU (in 500 shards, combined above), with --batch_zmws=100 --batch_size=1024. For recommendations on the optimal runtime setting and compute setups, see the runtime metrics page.

Runtime-yield tradeoffs with --skip_windows_above

The --skip_windows_above option (introduced in v0.3) allows DeepConsensus to skip windows whose average CCS base qualities are already above a certain quality threshold. The windows that are skipped just adopt the CCS sequence without correction. This saves runtime, but there is a yield tradeoff, shown in this chart for m64014_200920_132517-chr20:

runtime/yield tradeoff of --skip_windows_above.

The default in v1.2 is Q45, but you can adjust this level using --skip_windows_above.