Skip to content

Code for Sampling from Stochastic Finite Automata with Applications to CTC Decoding

License

Notifications You must be signed in to change notification settings

agutkin/ctc_sampling

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sampling from Stochastic Finite Automata with Applications to CTC Decoding

This repository contains code and data to accompany the paper Sampling from Stochastic Finite Automata with Applications to CTC Decoding, to appear in Interspeech 2019.

Abstract

Stochastic finite automata arise naturally in many language and speech processing tasks. They include stochastic acceptors, which represent certain probability distributions over random strings. We consider the problem of efficient sampling: drawing random string variates from the probability distribution represented by stochastic automata and transformations of those. We show that path-sampling is effective and can be efficient if the epsilon-graph of a finite automaton is acyclic. We provide an algorithm that ensures this by conflating epsilon-cycles within strongly connected components. Sampling is also effective in the presence of non-injective transformations of strings. We illustrate this in the context of decoding for Connectionist Temporal Classification (CTC), where the predictive probabilities yield auxiliary sequences which are transformed into shorter labeling strings. We can sample efficiently from the transformed labeling distribution and use this in two different strategies for finding the most probable CTC labeling.

Usage example

Compile the code with Bazel:

bazel build -c opt src/...

Decode a CTC phoneme lattice, generating verbose output:

bazel-bin/src/best-labeling -v=1 data/fsts/esw_04310_01381679842.fst

License

Code and data are Copyright 2018-2019 Google LLC, originally from google/language-resources, originally made available and redistributed here under the following licenses:

About

Code for Sampling from Stochastic Finite Automata with Applications to CTC Decoding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 84.7%
  • Python 15.0%
  • Shell 0.3%