-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReadme.txt
190 lines (146 loc) · 6.94 KB
/
Readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
--------------------------------------------------------------------------------
git clone https://github.com/benwills/MapV && cd MapV && make clean && make && make test
or, step by step:
git clone https://github.com/benwills/MapV
cd MapV
make clean
make
make test
--------------------------------------------------------------------------------
Now that _Delete() and _Destroy() have been created,
this is at an alpha(?) level of completion.
Additional work was done for:
- return codes where bool was insufficient
- splitting into a .h and .c file
- splitting the test code into its own .c file
- initial makefile
There is still some work to be done in terns of:
- creating tests and benchmarks
- improving the makefile
- turning it into a proper library structure
But I figured I'd publish it at this stage.
--------------------------------------------------------------------------------
performance:
10k short keys, 1k iterations
https://github.com/martinus/unordered_dense
lookups per second : 45,724,225.09
lookups per second : 45,586,745.62
lookups per second : 46,176,749.84
lookups per second : 45,053,718.42
lookups per second : 47,111,701.40
lookups per second : 43,749,540.65
lookups per second : 43,614,950.53
lookups per second : 46,957,599.19
lookups per second : 46,584,399.40
lookups per second : 48,132,160.20
average: 45,869,179.03
MapV:
lookups per second : 55,155,926.96
lookups per second : 55,046,138.41
lookups per second : 58,203,247.47
lookups per second : 61,124,753.78
lookups per second : 58,144,229.02
lookups per second : 59,746,045.25
lookups per second : 58,143,708.05
lookups per second : 62,624,697.12
lookups per second : 57,273,327.09
lookups per second : 54,008,191.17
average: 57,947,026.43
12MM urls, 10 iterations
ankerl:
lookups per second : 5,398,034.24
MapV:
lookups per second : 8,428,561.08
--------------------------------------------------------------------------------
@Requirements
- CPU supporting AVX2
- XXHash3
--------------------------------------------------------------------------------
@TODO: features/functionality
- creating tests and benchmarks
- improving the makefile
- turning it into a proper library structure
- cli testing, passing in a file for keys
- ability to set size of value different from 8 bytes
- store original key for exact lookup, removing probabilistic nature
- if all keys are 8/16/etc,bytes or less, store in the same memory segment
- else, store offset and len in 8 bytes; 48 + 16 (== 64k max key len)
- also consider 64 bit hash at that point
- also consider still generating 128 bit, but using the other 8bytes for slot id
- would require rehashing on resize
- would have to continue searching in case of hash collision
- hopefully resolved by using other 128-bit hash for slot id
- test performance
- check for empty slots on lookup; allow to return faster on no key
- generic c? c++ template?
--------------------------------------------------------------------------------
@TODO: variations
- MapVP: Persist to disk
- MapVS: Static/Unmodifiable after initial inserts
- MapVC: Compact+Static;
- alter hash (while maintaining integrity) to find smallest table size
- may have slightly slower lookups, due to:
- given non-power of two table size
- maximizing probe sequence length (though this can also be tuned)
- small sets may have faster lookups depending on if they fit in cpu cache
- allow to specify if all keys have the same len
- this allows xxhash optimization on smaller keys to run up to 2x as fast
--------------------------------------------------------------------------------
@DOCUMENTATION / readme
- probabilistic nature of 128-bit keys and dropping original keys
- benefits : memory access
- downsides: possible false positives
- slotId shift from top of hash means rebuilding table maintains order
- ie: early-loaded keys are found faster, regardless of rebuilds
- bucket structure = cache locality
- 8 byte values stored
- bucket/slotId/entry terminology
- configuration options
- distance (probe sequence length) is unsigned.
- means that all compares must compare high to low to avoid branches
- this is implemented, but worth noting.
- deleting does not adjust the number of buckets that must be scanned
in order to do this, the entire table would have to be scanned on delete.
a side effect of this is that a table will not shrink.
it would be possible to create a function like __Maybe_Shrink().
but my guess is that the use case for this is so rare that i won't be
implementing it
--------------------------------------------------------------------------------
@NOTES:
this is technically a probabilistic data structure, as the original
key is not compared. this is "okay" because 128 bits of a very high
quality hash is used for key lookups. there are multiple benefits:
primarily, cache locality of all lookup data.
secondarily, reduced memory usage from not storing original keys.
this is particularly useful with long keys (urls)
i'll be creating another variation that does use original key
comparisons, and will be able to use 64 bit hashes vs 128.
this will double the speed up hash lookups, but involves another
random memory access. so we'll see how it does.
it's also worth noting that this uses 256-bit vector instructions.
avx512 would double the speed of lookups, and modifying the code
to handled that would be trivial. i'll have to see if i have access
to a cpu that supports avx512
--------------------------------------------------------------------------------
@NOTE on SIMD for my future self:
apparently an equality comparison of certain floating point double values,
even though the bits are exactly the same, does not always return true with
certain vector instructions. eg: _mm256_movemask_pd()
the same 10 terms, out of 10,000 in my test set, were failing on lookups,
even though the data was right there.
that was a very long several hours. i honestly can't make sense of what was
happening; whether it's a compiler/cpu bug, or if there's something i just
don't understand about vector instructions (very possible).
anyway...the hash table is now properly resizing when hitting certain
thresholds. once deletes are done, it'll be a suitable alpha.
--------------------------------------------------------------------------------
@FUTURE: i'm on the fence about these, but leaving the notes
- variants
- 16/32-bit data stored (eg: array ids, types, etc)
- would require 2x/4x bucket size for 32-byte alignment
- 64-bit: save keys and compare vs 128-bit probabilistic
- avx512
- 256-bit hashes?
- static tables
- post-hash optimizer for compaction
- would require table sizes != power of 2