-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
54 lines (40 loc) · 1.19 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Lossless Nanopore Compression
=============================
This is a final year undergraduate Computer Science honours project for the
University of Sydney.
Objective
---------
Design lossless compression methods with better space saving than the
state-of-the-art; zstd-svb-zd (a.k.a VBZ).
Contributions
-------------
- First systematic analysis of nanopore data
- New state-of-the-art
- First comprehensive benchmark of existing and novel methods
Thesis
------
Read the thesis: thesis/thesis_signed.pdf
Presentation
------------
Read the presentation slides: sent/final/pres.pdf
Data
----
A downsampled human DNA data set (NA12878) with 500 000 reads was used for
analysis and benchmarking.
Download: https://slow5.page.link/na12878_prom_sub_slow5.
Benchmark
---------
Sequential read compression and decompression is performed. To ensure the
methods are lossless, the decompressed data is compared to the uncompressed
data for equality.
The following metrics are recorded:
- Compressed size
- Compression time
- Decompression time
1. Compile the benchmark.
make -C press
2. Run it on a data set.
cd press
./test SLOW5_DATA
Or, use the example data set with 3 reads.
./test ../data/three-reads.blow5