Kaldi Active Grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.

BETA RELEASE

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.

The Python package includes all necessary binaries for decoding on Linux or Windows.
A compatible general English Kaldi nnet3 chain model is available, under releases.
A compatible backend for Dragonfly is under development, currently in the kaldi branch of my fork.

Donations are appreciated to encourage development.

Setup

Requirements:

Python 2.7 (3.x support planned); 64-bit required!
- Microphone support provided by pyaudio package
OS: Linux or Windows; macOS planned if there is interest
Only supports Kaldi left-biphone models, specifically nnet3 chain models

Install Python package, which includes necessary Kaldi binaries:

pip install kaldi-active-grammar

Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

Troubleshooting

Errors installing
- Make sure you are using a 64-bit Python.
- Update your pip by executing pip install --upgrade pip.

Contributing

Please feel free to submit issues, suggestions, and feature requests. Pull requests are considered, but project structure is in flux.

Donations are appreciated to encourage development.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0), with the exception of the associated binaries, whose source is currently unreleased and which are only to be used by this project. See the LICENSE.txt file for details.

If this license is problematic for you, please contact me.

Acknowledgments

Based on and including code from Kaldi ASR, under the Apache-2.0 license.
Code from OpenFST and OpenFST port for Windows, under the Apache-2.0 license.
Intel Math Kernel Library, copyright (c) 2018 Intel Corporation, under the Intel Simplified Software License.
Modified generic English Kaldi nnet3 chain model from Zamia Speech, under the LGPL-3.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
kaldi_active_grammar		kaldi_active_grammar
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaldi Active Grammar

Setup

Troubleshooting

Contributing

Author

License

Acknowledgments

About

Releases

Packages

Languages

License

rohithkodali/kaldi-active-grammar

Folders and files

Latest commit

History

Repository files navigation

Kaldi Active Grammar

Setup

Troubleshooting

Contributing

Author

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages