Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ASpIRE Chain Model (By Dan Povey) #50

Open
adx349 opened this issue Apr 20, 2017 · 9 comments
Open

Use ASpIRE Chain Model (By Dan Povey) #50

adx349 opened this issue Apr 20, 2017 · 9 comments

Comments

@adx349
Copy link

adx349 commented Apr 20, 2017

Thank you for your work on kaldi, it is very helpful for me.
I was wondering what changes do I have to make to use the latest ASpIRE Chain Model.
I tried changing the nnet-mode=3 and also replace fst,mdl,conf files with the new model but it is not giving me any output.
What do you think is the issue ?

@arawind
Copy link

arawind commented Apr 29, 2017

@adx349 Try setting the frame-subsamping-factor to 3, and the acoustic-scale to 1

Take a look at this thread for more details

@fanskyer
Copy link

fanskyer commented May 9, 2017

i too tried that, and found it is not working. my guess is that it ASPIRE model use BLSTM which is not supported in this online decoding.

@tshastry
Copy link

tshastry commented May 9, 2017

@fanskyer @adx349 I actually think it is an issue with the new Kaldi looped decoding not working properly. If you rollback Kaldi to commit bcc71b67d489a1766922c9caf2a54306755f1861 and gst-kaldi-nnet2-online to commit 63b2cfd, then the ASPIRE model works. You will still need to set nnet-mode to 3, acoustic-scale to 1, and frame-subsampling-factor to 3

@maxhawkins
Copy link

maxhawkins commented May 28, 2017

Were you able to get this working? I tried rolling back to 63b2cfd and setting those options in my config. No luck, it just returns yeah yeah yeah over and over again.

Here's my config: https://gist.github.com/maxhawkins/24edbd87be0aa1601da5034acc27d7ee

I'm using the ASpIRE chain model from kaldi-asr.org with an HCLG.fst created using the documentation.

@maxhawkins
Copy link

Never mind. I was using the client incorrectly. When I converted my wav file to raw PCM it started working fine.

For anyone who encounters this in the future, here are the steps I took:

  1. Compile kaldi-asr/kaldi@bcc71b6 and 63b2cfd
  2. Compile the ASpIRE HCLG.fst and point the worker.yml to it.
  3. Start the server and pass it raw audio using client.py
python kaldigstserver/master_server.py --port=8888 &
env GST_PLUGIN_PATH=.. python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c worker.yaml &
sox audio.wav -r 8000 -e signed -b 16 -c 1 -t raw audio.raw remix 1
python kaldigstserver/client.py -r 16000 audio.raw

@tshastry
Copy link

Just an update to this -- I did some testing on my side, and the ASpIRE model will work with the latest commits and the frame-subsampling-factor set to 1 instead of 3. This is necessary for the most recent "looped decoding" implementation of Kaldi it seems. However, the accuracy appears to be worse than when the commits of both are reversed.

@maxhawkins
Copy link

Thanks I'll give that a shot.

I'm also seeing some errors with word-level alignment (subtle drift noticeable on long recordings) with the ASPIRE model at 63b2cfd, but I think that's a separate issue. I'll keep troubleshooting and file another bug if I can't resolve it.

@suhel-jaber
Copy link

It works for me, but it keeps outputting "mhm" every few seconds, while TEDLIUM didn't. Anyone experienced the same issue?

@maxhawkins
Copy link

I've had that issue before. Usually it means your settings are wrong. Check the acoustic-scale and frame-subsampling-factor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants