Skip to content

Lossless compression for nanopore DNA signal data

Notifications You must be signed in to change notification settings

sashajenner/honours

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lossless Nanopore Compression
=============================
This is a final year undergraduate Computer Science honours project for the
University of Sydney.

Objective
---------
Design lossless compression methods with better space saving than the
state-of-the-art; zstd-svb-zd (a.k.a VBZ).

Contributions
-------------
- First systematic analysis of nanopore data
- New state-of-the-art
- First comprehensive benchmark of existing and novel methods

Thesis
------
Read the thesis: thesis/thesis_signed.pdf

Presentation
------------
Read the presentation slides: sent/final/pres.pdf

Data
----
A downsampled human DNA data set (NA12878) with 500 000 reads was used for
analysis and benchmarking.

Download: https://slow5.page.link/na12878_prom_sub_slow5.

Benchmark
---------
Sequential read compression and decompression is performed. To ensure the
methods are lossless, the decompressed data is compared to the uncompressed
data for equality.

The following metrics are recorded:
- Compressed size
- Compression time
- Decompression time

1. Compile the benchmark.

	make -C press

2. Run it on a data set.

	cd press
	./test SLOW5_DATA

Or, use the example data set with 3 reads.

	./test ../data/three-reads.blow5

About

Lossless compression for nanopore DNA signal data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published