-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Jonas Andreas Sibbesen
committed
Feb 26, 2018
1 parent
71b9179
commit 4f6ddfe
Showing
211 changed files
with
10,754 additions
and
2,873 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Ignore Mac specific files | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
project(kmc) | ||
|
||
SET(LIBRARY_OUTPUT_PATH ${CMAKE_SOURCE_DIR}/lib) | ||
|
||
include_directories(${CMAKE_SOURCE_DIR}/external/kmc_api) | ||
|
||
add_library(${PROJECT_NAME} kmc_file.cpp kmer_api.cpp mmer.cpp) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
AccessModifierOffset: -2 | ||
AlignEscapedNewlinesLeft: false | ||
AlignTrailingComments: true | ||
AllowAllParametersOfDeclarationOnNextLine: false | ||
AllowShortIfStatementsOnASingleLine: false | ||
AllowShortLoopsOnASingleLine: false | ||
AllowShortFunctionsOnASingleLine: false | ||
AlwaysBreakBeforeMultilineStrings: true | ||
AlwaysBreakTemplateDeclarations: true | ||
BinPackParameters: true | ||
BreakBeforeBinaryOperators: NonAssignment | ||
BreakBeforeBraces: Attach | ||
BreakBeforeTernaryOperators: false | ||
ColumnLimit: 80 | ||
ConstructorInitializerAllOnOneLineOrOnePerLine: true | ||
ConstructorInitializerIndentWidth: 4 | ||
ContinuationIndentWidth: 2 | ||
Cpp11BracedListStyle: true | ||
IndentCaseLabels: true | ||
IndentWidth: 2 | ||
MaxEmptyLinesToKeep: 1 | ||
NamespaceIndentation: None | ||
|
||
# Force pointers to the type | ||
DerivePointerAlignment: false | ||
PointerAlignment: Left | ||
|
||
# Put space after = and after control statements | ||
SpaceBeforeAssignmentOperators: true | ||
SpaceBeforeParens: ControlStatements | ||
|
||
SpaceInEmptyParentheses: false | ||
SpacesBeforeTrailingComments: 1 | ||
SpacesInAngles: false | ||
SpacesInCStyleCastParentheses: false | ||
SpacesInParentheses: false | ||
Standard: Cpp11 | ||
UseTab: Never | ||
BreakConstructorInitializersBeforeComma: false | ||
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
.*swp | ||
.*swo | ||
.DS_Store | ||
build | ||
doc/gh-pages | ||
Makefile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# -- Project Setup ------------------------------------------------------------ | ||
|
||
project(libbf) | ||
|
||
SET(LIBRARY_OUTPUT_PATH ${CMAKE_SOURCE_DIR}/lib) | ||
|
||
include_directories(${CMAKE_SOURCE_DIR}/external/libbf) | ||
|
||
set(libbf_sources | ||
src/bitvector.cpp | ||
src/counter_vector.cpp | ||
src/hash.cpp | ||
src/bloom_filter/a2.cpp | ||
src/bloom_filter/basic.cpp | ||
src/bloom_filter/bitwise.cpp | ||
src/bloom_filter/counting.cpp | ||
src/bloom_filter/stable.cpp | ||
) | ||
|
||
add_library(libbf ${libbf_sources}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
Copyright (c) 2016, Matthias Vallentin | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright | ||
notice, this list of conditions and the following disclaimer in the | ||
documentation and/or other materials provided with the distribution. | ||
|
||
3. Neither the name of the copyright holder nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE | ||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
**libbf** is a C++11 library which implements [various Bloom | ||
filters][blog-post], including: | ||
|
||
- Basic | ||
- Counting | ||
- Spectral MI | ||
- Spectral RM | ||
- Bitwise | ||
- A^2 | ||
- Stable | ||
|
||
[blog-post]: http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/ | ||
|
||
Synopsis | ||
======== | ||
|
||
#include <iostream> | ||
#include <bf.h> | ||
|
||
int main() | ||
{ | ||
bf::basic_bloom_filter b(0.8, 100); | ||
|
||
// Add two elements. | ||
b.add("foo"); | ||
b.add(42); | ||
|
||
// Test set membership | ||
std::cout << b.lookup("foo") << std::endl; // 1 | ||
std::cout << b.lookup("bar") << std::endl; // 0 | ||
std::cout << b.lookup(42) << std::endl; // 1 | ||
|
||
// Remove all elements. | ||
b.clear(); | ||
std::cout << b.lookup("foo") << std::endl; // 0 | ||
std::cout << b.lookup(42) << std::endl; // 0 | ||
|
||
return 0; | ||
} | ||
|
||
Requirements | ||
============ | ||
|
||
- A C++11 compiler (GCC >= 4.7 or Clang >= 3.2) | ||
- CMake (>= 2.8) | ||
|
||
Installation | ||
============ | ||
|
||
The build process uses CMake, wrapped in autotools-like scripts. The configure | ||
script honors the `CXX` environment variable to select a specific C++compiler. | ||
For example, the following steps compile libbf with Clang and install it under | ||
`PREFIX`: | ||
|
||
CXX=clang++ ./configure --prefix=PREFIX | ||
make | ||
make test | ||
make install | ||
|
||
Documentation | ||
============= | ||
|
||
The most recent version of the Doxygen API documentation exists at | ||
<http://mavam.github.io/libbf/api>. Alternatively, you can build the | ||
documentation locally via `make doc` and then browse to | ||
`doc/gh-pages/api/index.html`. | ||
|
||
Usage | ||
===== | ||
|
||
After having installed libbf, you can use it in your application by including | ||
the header file `bf.h` and linking against the library. All data structures | ||
reside in the namespace `bf` and the following examples assume: | ||
|
||
using namespace bf; | ||
|
||
Each Bloom filter inherits from the abstract base class `bloom_filter`, which | ||
provides addition and lookup via the virtual functions `add` and `lookup`. | ||
These functions take an *object* as argument, which serves a light-weight view | ||
over sequential data for hashing. | ||
|
||
For example, if you can create a basic Bloom filter with a desired | ||
false-positive probability and capacity as follows: | ||
|
||
// Construction. | ||
bloom_filter* bf = new basic_bloom_filter(0.8, 100); | ||
|
||
// Addition. | ||
bf->add("foo"); | ||
bf->add(42); | ||
|
||
// Lookup. | ||
assert(bf->lookup("foo") == 1); | ||
assert(bf->lookup(42) == 1); | ||
|
||
// Remove all elements from the Bloom filter. | ||
bf->clear(); | ||
|
||
In this case, libbf computes the optimal number of hash functions needed to | ||
achieve the desired false-positive rate which holds until the capacity has been | ||
reached (80% and 100 distinct elements, in the above example). Alternatively, | ||
you can construct a basic Bloom filter by specifying the number of hash | ||
functions and the number of cells in the underlying bit vector: | ||
|
||
bloom_filter* bf = new basic_bloom_filter(make_hasher(3), 1024); | ||
|
||
Since not all Bloom filter implementations come with closed-form solutions | ||
based on false-positive probabilities, most constructors use this latter form | ||
of explicit resource provisioning. | ||
|
||
In the above example, the free function `make_hasher` constructs a *hasher*-an | ||
abstraction for hashing objects *k* times. There exist currently two different | ||
hasher, a `default_hasher` and a | ||
[`double_hasher`](http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf). The | ||
latter uses a linear combination of two pairwise-independent, universal hash | ||
functions to produce the *k* digests, whereas the former merely hashes the | ||
object *k* times. | ||
|
||
Evaluation | ||
---------- | ||
|
||
libbf also ships with a small Bloom filter tool `bf` in the test directory. | ||
This program supports evaluation the accuracy of the different Bloom filter | ||
flavors with respect to their false-positive and false-negative rates. Have a | ||
look at the console help (`-h` or `--help`) for detailed usage instructions. | ||
|
||
The tool operates in two phases: | ||
|
||
1. Read input from a file and insert it into a Bloom filter | ||
2. Query the Bloom filter and compare the result to the ground truth | ||
|
||
For example, consider the following input file: | ||
|
||
foo | ||
bar | ||
baz | ||
baz | ||
foo | ||
|
||
From this input file, you can generate the real ground truth file as follows: | ||
|
||
sort input.txt | uniq -c | tee query.txt | ||
1 bar | ||
2 baz | ||
2 foo | ||
|
||
The tool `bf` will compute false-positive and false-negative counts for each | ||
element, based on the ground truth given. In the case of a simple counting | ||
Bloom filter, an invocation may look like this: | ||
|
||
bf -t counting -m 2 -k 3 -i input.txt -q query.txt | column -t | ||
|
||
Yielding the following output: | ||
|
||
TN TP FP FN G C E | ||
0 1 0 0 1 1 bar | ||
0 1 0 1 2 1 baz | ||
0 1 0 2 2 1 foo | ||
|
||
The column headings denote true negatives (`TN`), true positives (`TP`), false | ||
positives (`FP`), false negatives (`FN`), ground truth count (`G`), actual | ||
count (`C`), and the queried element. The counts are cumulative to support | ||
incremental evaluation. | ||
|
||
License | ||
======== | ||
|
||
libbf comes with a BSD-style license (see [COPYING](COPYING) for details). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Add the line | ||
|
||
file:///path/to/this/directory | ||
|
||
to /opt/local/etc/macports/sources.conf *before* the rsync source and run | ||
|
||
sudo portindex | ||
|
||
in the same directory where this README is located. |
Oops, something went wrong.