Skip to content

Commit

Permalink
BayesTyper (v1.2)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonas Andreas Sibbesen committed Feb 26, 2018
1 parent 71b9179 commit 4f6ddfe
Show file tree
Hide file tree
Showing 211 changed files with 10,754 additions and 2,873 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Ignore Mac specific files
.DS_Store
8 changes: 6 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,15 @@ execute_process(
OUTPUT_STRIP_TRAILING_WHITESPACE
)

SET(CMAKE_CXX_FLAGS "--std=c++11 -g -Wall -O3 -DBT_VERSION='\"v1.1 ${GIT_LAST_COMMIT_HASH}\"' -lpthread")
SET(CMAKE_CXX_FLAGS "--std=c++11 -g -Wall -O3 -DBT_VERSION='\"v1.2 ${GIT_LAST_COMMIT_HASH}\"' -lpthread")

FIND_PACKAGE(Boost COMPONENTS program_options system filesystem iostreams REQUIRED)
message(STATUS ${Boost_LIBRARIES})

add_subdirectory(${CMAKE_SOURCE_DIR}/external/kmc_api)
add_subdirectory(${CMAKE_SOURCE_DIR}/external/libbf)
add_subdirectory(${CMAKE_SOURCE_DIR}/src/vcf++)
add_subdirectory(${CMAKE_SOURCE_DIR}/src/kmerBloom)
add_subdirectory(${CMAKE_SOURCE_DIR}/src/bayesTyper)
add_subdirectory(${CMAKE_SOURCE_DIR}/src/bayesTyperTools)
add_subdirectory(${CMAKE_SOURCE_DIR}/src/bayesTyperTools/scripts)
add_subdirectory(${CMAKE_SOURCE_DIR}/src/bayesTyperTools/scripts)
7 changes: 7 additions & 0 deletions external/kmc_api/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
project(kmc)

SET(LIBRARY_OUTPUT_PATH ${CMAKE_SOURCE_DIR}/lib)

include_directories(${CMAKE_SOURCE_DIR}/external/kmc_api)

add_library(${PROJECT_NAME} kmc_file.cpp kmer_api.cpp mmer.cpp)
41 changes: 41 additions & 0 deletions external/libbf/.clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
AccessModifierOffset: -2
AlignEscapedNewlinesLeft: false
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AllowShortFunctionsOnASingleLine: false
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: true
BinPackParameters: true
BreakBeforeBinaryOperators: NonAssignment
BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: false
ColumnLimit: 80
ConstructorInitializerAllOnOneLineOrOnePerLine: true
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 2
Cpp11BracedListStyle: true
IndentCaseLabels: true
IndentWidth: 2
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None

# Force pointers to the type
DerivePointerAlignment: false
PointerAlignment: Left

# Put space after = and after control statements
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements

SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
Standard: Cpp11
UseTab: Never
BreakConstructorInitializersBeforeComma: false
...
6 changes: 6 additions & 0 deletions external/libbf/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.*swp
.*swo
.DS_Store
build
doc/gh-pages
Makefile
20 changes: 20 additions & 0 deletions external/libbf/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# -- Project Setup ------------------------------------------------------------

project(libbf)

SET(LIBRARY_OUTPUT_PATH ${CMAKE_SOURCE_DIR}/lib)

include_directories(${CMAKE_SOURCE_DIR}/external/libbf)

set(libbf_sources
src/bitvector.cpp
src/counter_vector.cpp
src/hash.cpp
src/bloom_filter/a2.cpp
src/bloom_filter/basic.cpp
src/bloom_filter/bitwise.cpp
src/bloom_filter/counting.cpp
src/bloom_filter/stable.cpp
)

add_library(libbf ${libbf_sources})
28 changes: 28 additions & 0 deletions external/libbf/COPYING
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Copyright (c) 2016, Matthias Vallentin
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
168 changes: 168 additions & 0 deletions external/libbf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
**libbf** is a C++11 library which implements [various Bloom
filters][blog-post], including:

- Basic
- Counting
- Spectral MI
- Spectral RM
- Bitwise
- A^2
- Stable

[blog-post]: http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/

Synopsis
========

#include <iostream>
#include <bf.h>

int main()
{
bf::basic_bloom_filter b(0.8, 100);

// Add two elements.
b.add("foo");
b.add(42);

// Test set membership
std::cout << b.lookup("foo") << std::endl; // 1
std::cout << b.lookup("bar") << std::endl; // 0
std::cout << b.lookup(42) << std::endl; // 1

// Remove all elements.
b.clear();
std::cout << b.lookup("foo") << std::endl; // 0
std::cout << b.lookup(42) << std::endl; // 0

return 0;
}

Requirements
============

- A C++11 compiler (GCC >= 4.7 or Clang >= 3.2)
- CMake (>= 2.8)

Installation
============

The build process uses CMake, wrapped in autotools-like scripts. The configure
script honors the `CXX` environment variable to select a specific C++compiler.
For example, the following steps compile libbf with Clang and install it under
`PREFIX`:

CXX=clang++ ./configure --prefix=PREFIX
make
make test
make install

Documentation
=============

The most recent version of the Doxygen API documentation exists at
<http://mavam.github.io/libbf/api>. Alternatively, you can build the
documentation locally via `make doc` and then browse to
`doc/gh-pages/api/index.html`.

Usage
=====

After having installed libbf, you can use it in your application by including
the header file `bf.h` and linking against the library. All data structures
reside in the namespace `bf` and the following examples assume:

using namespace bf;

Each Bloom filter inherits from the abstract base class `bloom_filter`, which
provides addition and lookup via the virtual functions `add` and `lookup`.
These functions take an *object* as argument, which serves a light-weight view
over sequential data for hashing.

For example, if you can create a basic Bloom filter with a desired
false-positive probability and capacity as follows:

// Construction.
bloom_filter* bf = new basic_bloom_filter(0.8, 100);

// Addition.
bf->add("foo");
bf->add(42);

// Lookup.
assert(bf->lookup("foo") == 1);
assert(bf->lookup(42) == 1);

// Remove all elements from the Bloom filter.
bf->clear();

In this case, libbf computes the optimal number of hash functions needed to
achieve the desired false-positive rate which holds until the capacity has been
reached (80% and 100 distinct elements, in the above example). Alternatively,
you can construct a basic Bloom filter by specifying the number of hash
functions and the number of cells in the underlying bit vector:

bloom_filter* bf = new basic_bloom_filter(make_hasher(3), 1024);

Since not all Bloom filter implementations come with closed-form solutions
based on false-positive probabilities, most constructors use this latter form
of explicit resource provisioning.

In the above example, the free function `make_hasher` constructs a *hasher*-an
abstraction for hashing objects *k* times. There exist currently two different
hasher, a `default_hasher` and a
[`double_hasher`](http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf). The
latter uses a linear combination of two pairwise-independent, universal hash
functions to produce the *k* digests, whereas the former merely hashes the
object *k* times.

Evaluation
----------

libbf also ships with a small Bloom filter tool `bf` in the test directory.
This program supports evaluation the accuracy of the different Bloom filter
flavors with respect to their false-positive and false-negative rates. Have a
look at the console help (`-h` or `--help`) for detailed usage instructions.

The tool operates in two phases:

1. Read input from a file and insert it into a Bloom filter
2. Query the Bloom filter and compare the result to the ground truth

For example, consider the following input file:

foo
bar
baz
baz
foo

From this input file, you can generate the real ground truth file as follows:

sort input.txt | uniq -c | tee query.txt
1 bar
2 baz
2 foo

The tool `bf` will compute false-positive and false-negative counts for each
element, based on the ground truth given. In the case of a simple counting
Bloom filter, an invocation may look like this:

bf -t counting -m 2 -k 3 -i input.txt -q query.txt | column -t

Yielding the following output:

TN TP FP FN G C E
0 1 0 0 1 1 bar
0 1 0 1 2 1 baz
0 1 0 2 2 1 foo

The column headings denote true negatives (`TN`), true positives (`TP`), false
positives (`FP`), false negatives (`FN`), ground truth count (`G`), actual
count (`C`), and the queried element. The counts are cumulative to support
incremental evaluation.

License
========

libbf comes with a BSD-style license (see [COPYING](COPYING) for details).
9 changes: 9 additions & 0 deletions external/libbf/aux/macports/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Add the line

file:///path/to/this/directory

to /opt/local/etc/macports/sources.conf *before* the rsync source and run

sudo portindex

in the same directory where this README is located.
Loading

0 comments on commit 4f6ddfe

Please sign in to comment.