-
Notifications
You must be signed in to change notification settings - Fork 12
/
NOTICE
112 lines (88 loc) · 3.19 KB
/
NOTICE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
How to use, and some performance results
=================
How to use
-----------
You type a following line for encoding:
./vcompress [ID] [FILE]
A pre-defined value needs to be input in [ID], and
the input in [FILE] needs to follow specified formats.
These details can be found in following section.
For decoding, you say a line below:
./vcompress -d [FILE] <[OUT]>
When finishing decoding, it shows performance results
such as the compression ratio and the speed. To write
the decoded integers in local filesystems, a output
filename needs to be filled in [OUT].
For example, if you would like to compress a file, named
XXX, following specified formats with N Gamma(ID: 0), you
just type a followin line:
./vcompress 0 XXX
This line generates two files; XXX.gamma and XXX.pos.
And then, the corresponding decoding command is:
./vcompress -d XXX
Anyone who would like to do a quikc trial can access
a following web site to have sample data sets.
http://integerencoding.isti.cnr.it/
Implemented encoder IDs
-----------
The library uses following IDs:
ID Name
---
0 N Gamma
1 FU Gamma
2 F Gamma
3 N Delta
4 FU Delta
5 FG Delta
6 F Delta
7 Variable Byte
8 Binary Interpolative
9 Simple 9
10 Simple 16
11 PForDelta
12 OPTPForDelta
13 VSEncodingBlocks
14 VSE-R
15 VSEncodingRest
16 VSEncodingBlocksHybrid
17 VSEncodingSimple
An input/output file format
-----------
In the implemented encoders in the library, an input/output format
follows a rule below:
[total of int., a first int., a second int., ..., a last int.]
[total of int., a first int., a second int., ..., a last int.]
...
Each entry occupies a 32-bit word, and these integers MUST appear
in an ascending order.
Performance
-----------
Some performance results(left and right values are mis and bpi,
respectively) of these encoders are shown here. gov2 was used as
a test set, which is one of TREC test collections in the Terabyte
Track. This test was done on a machine with a Intel Core i5-2500
(3.3GHz) processor and 8GiB of memory.
(git tag benchmark-20120928):
CoderName mis bpi
---
N Gamma 59.14 5.05
FU Gamma 93.61 5.05
F Gamma 89.43 5.05
N Delta 68.82 4.70
FU Delta 88.69 4.70
FG Delta 103.06 4.70
F Delta 122.26 4.70
Variable Byte 165.28 9.18
Binary Interpolative 71.71 3.95
Simple 9 645.52 5.41
Simple 16 642.80 5.17
PForDelta 1043.99 6.00
OPTPForDelta 828.32 5.46
VSEncodingBlocks 973.65 4.51
VSE-R 182.61 4.24
VSEncodingRest 520.72 4.34
VSEncodingBlocksHybrid 808.90 4.40
VSEncodingSimple 1712.34 6.13
, where 'mis' represents these average decompression speeds
expressed in millions of integers per second, and 'bpi' does
these compression performance expressed in bits per integers.