Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mic and Azure Speech to Text #35

Open
lucasctd opened this issue Jun 29, 2019 · 2 comments
Open

Mic and Azure Speech to Text #35

lucasctd opened this issue Jun 29, 2019 · 2 comments

Comments

@lucasctd
Copy link

I am trying to recognize the user voice continuously, but I am always getting wrong results. Have anybody done something like this?

I will add some parts of my code so you can understand.

Here is how I create an instance of pushStream (MS Speech SDK)

this.pushStream = AudioInputStream.createPushStream(AudioStreamFormat.getWaveFormatPCM(16, 16000, 1));

Here is the method I call to recognize the user voice

    recognizeAsync() {
        this.audioConfig = AudioConfig.fromStreamInput(this.pushStream);
        this.recognizer = new SpeechRecognizer(this.speechConfig, this.audioConfig);
        this.subject = new Observable(subs => {
            this.subscription = subs;
            this.recognizer.startContinuousRecognitionAsync();
            this.recognizer.recognizing = (rec, {result}) => {
                subs.next(result);
            };
            this.recognizer.recognized = (rec, {result}) => {
                subs.next(result);
            };
        });
        return this.subject;
    }

And here is where I use the mic package to get the user voice data

speech = new Speech(language, subscriptionKey, region);
speech.recognizeAsync().subscribe(result => {
        console.log('result', result);
});
var micInstance = mic({
        rate: '16000',
        channels: '1',
        debug: false,
        exitOnSilence: 6,
        fileType: 'wav' //have also tried with raw type
});
const micInputStream = micInstance.getAudioStream();

micInputStream.on('data', function(data) {
    speech.pushStream.write(data);
    //console.log("Recieved Input Stream: ", data);
});
@rhurey
Copy link

rhurey commented Jul 25, 2019

The root cause here looks to be something with the stdio redirection resulting in twice the expected data being available.

I tried to manually call sox to see how it was producing audio.
Experiment results:
sox.exe -c 1 -b 16 -e signed-integer -r 16000 -t waveaudio default -p > redirect.wav
Ran for 10s.
redirect.wav is 655,408 redirect.wav

Had Sox write the file directly:
sox.exe -c 1 -b 16 -e signed-integer -r 16000 -t waveaudio default redirect2.wav
Ran for 10s.
This output 327,724 redirect2.wav

That tells me the doubling of the data is happening as a result of the stdio redirect. It's not clear why that's happening, but the possibility that the doubling is platform specific causes fragility concerns. Plus who knows what extra data is winding up in the audio.

@UCABJDP
Copy link

UCABJDP commented Aug 26, 2020

#40 May be the root cause here, piping audio out of sox forces the format to be 32 bit audio, which may gives appearance of it generating double the data when set to 16 bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants