Skip to content

Commit

Permalink
viterbi search
Browse files Browse the repository at this point in the history
  • Loading branch information
iiSeymour committed Apr 17, 2020
1 parent 70a8553 commit 12bd41d
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 12 deletions.
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

![test-fast-ctc-decode](https://github.com/nanoporetech/fast-ctc-decode/workflows/test-fast-ctc-decode/badge.svg) [![PyPI version](https://badge.fury.io/py/fast-ctc-decode.svg)](https://badge.fury.io/py/fast-ctc-decode)

Blitzing fast beam search.
Blitzing fast CTC decoding library.

```
$ pip install fast-ctc-decode
Expand All @@ -11,14 +11,16 @@ $ pip install fast-ctc-decode
## Usage

```python
>>> from fast_ctc_decode import beam_search
>>> from fast_ctc_decode import beam_search, viterbi_search
>>>
>>> beam_size = 5
>>> alphabet = "NACGT"
>>> beam_prune_threshold = 0.1
>>> posteriors = np.random.rand(100, len(alphabet)).astype(np.float32)
>>>
>>> seq, path = beam_search(posteriors, alphabet, beam_size, beam_prune_threshold)
>>> seq, path = viterbi_search(posteriors, alphabet)
>>> seq
'ACACTCGCAGCGCGATACGACTGATCGAGATATACTCAGTGTACACAGT'
>>>
>>> seq, path = beam_search(posteriors, alphabet, beam_size=5, beam_prune_threshold=0.1)
>>> seq
'ACACTCGCAGCGCGATACGACTGATCGAGATATACTCAGTGTACACAGT'
```
Expand All @@ -27,7 +29,8 @@ $ pip install fast-ctc-decode

| Implementation | Time (s) | URL |
| -------------------- | -------- | --- |
| Greedy (Python) | 0.0022 | |
! Viterbi (Rust) | 0.0003 | [nanoporetech/fast-ctc-decode](https://github.com/nanoporetech/fast-ctc-decode.git) |
| Viterbi (Python) | 0.0022 | |
| Beam Search (Rust) | 0.0033 | [nanoporetech/fast-ctc-decode](https://github.com/nanoporetech/fast-ctc-decode.git) |
| Beam Search (C++) | 0.1034 | [parlance/ctcdecode](https://github.com/parlance/ctcdecode) |
| Beam Search (Python) | 3.3337 | [githubharald/CTCDecoder](https://github.com/githubharald/CTCDecoder) |
Expand All @@ -50,7 +53,9 @@ accurate calculations but makes the 2D search take about twice as long.

## Credits

The original beam search implementation was developed by [@usamec](https://github.com/usamec) for [deepnano-blitz](https://github.com/fmfi-compbio/deepnano-blitz).
The original 1D beam search implementation was developed by [@usamec](https://github.com/usamec) for [deepnano-blitz](https://github.com/fmfi-compbio/deepnano-blitz).

The 2D beam search is based on @jordisr and @ihh work in their [pair consensus decoding](https://doi.org/10.1101/2020.02.25.956771) paper.

### Licence and Copyright
(c) 2019 Oxford Nanopore Technologies Ltd.
Expand Down
17 changes: 12 additions & 5 deletions tests/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,23 @@
import numpy as np
from itertools import groupby

from fast_ctc_decode import beam_search
from fast_ctc_decode import beam_search, viterbi_search

def decode_ctc_greedy(predictions, labels, *args):
def decode_ctc_greedy_py(predictions, labels, *args):
"""
Argmax decoder with collapsing repeats
Argmax decoder with collapsing repeats in Python
"""
path = np.argmax(predictions, axis=1)
return ''.join([labels[b] for b, g in groupby(path) if b])

TESTS = [decode_ctc_greedy, beam_search]
def decode_ctc_greedy_rust(predictions, labels, *args):
"""
Argmax decoder with collapsing repeats in Rust
"""
seq, path = viterbi_search(predictions, labels)
return seq

TESTS = [decode_ctc_greedy_rust, decode_ctc_greedy_py, beam_search]

try:
import torch
Expand Down Expand Up @@ -73,6 +80,6 @@ def benchmark(f, x, beam_size=5, beam_cut_threshold=0.1, labels='NACGT', limit=1
timings = []
for _ in range(n):
timings.append(benchmark(f, x, beam_size, prune, limit=limit))
print('{:18s}: mean(sd) of {} runs: {:2.6f}({:2.6f})'.format(
print('{:23s}: mean(sd) of {} runs: {:2.6f}({:2.6f})'.format(
f.__name__, n, np.mean(timings), np.std(timings))
)

0 comments on commit 12bd41d

Please sign in to comment.