Mic and Azure Speech to Text #35

lucasctd · 2019-06-29T18:04:15Z

I am trying to recognize the user voice continuously, but I am always getting wrong results. Have anybody done something like this?

I will add some parts of my code so you can understand.

Here is how I create an instance of pushStream (MS Speech SDK)

this.pushStream = AudioInputStream.createPushStream(AudioStreamFormat.getWaveFormatPCM(16, 16000, 1));

Here is the method I call to recognize the user voice

    recognizeAsync() {
        this.audioConfig = AudioConfig.fromStreamInput(this.pushStream);
        this.recognizer = new SpeechRecognizer(this.speechConfig, this.audioConfig);
        this.subject = new Observable(subs => {
            this.subscription = subs;
            this.recognizer.startContinuousRecognitionAsync();
            this.recognizer.recognizing = (rec, {result}) => {
                subs.next(result);
            };
            this.recognizer.recognized = (rec, {result}) => {
                subs.next(result);
            };
        });
        return this.subject;
    }

And here is where I use the mic package to get the user voice data

speech = new Speech(language, subscriptionKey, region);
speech.recognizeAsync().subscribe(result => {
        console.log('result', result);
});
var micInstance = mic({
        rate: '16000',
        channels: '1',
        debug: false,
        exitOnSilence: 6,
        fileType: 'wav' //have also tried with raw type
});
const micInputStream = micInstance.getAudioStream();

micInputStream.on('data', function(data) {
    speech.pushStream.write(data);
    //console.log("Recieved Input Stream: ", data);
});

The text was updated successfully, but these errors were encountered:

rhurey · 2019-07-25T20:03:39Z

The root cause here looks to be something with the stdio redirection resulting in twice the expected data being available.

I tried to manually call sox to see how it was producing audio.
Experiment results:
sox.exe -c 1 -b 16 -e signed-integer -r 16000 -t waveaudio default -p > redirect.wav
Ran for 10s.
redirect.wav is 655,408 redirect.wav

Had Sox write the file directly:
sox.exe -c 1 -b 16 -e signed-integer -r 16000 -t waveaudio default redirect2.wav
Ran for 10s.
This output 327,724 redirect2.wav

That tells me the doubling of the data is happening as a result of the stdio redirect. It's not clear why that's happening, but the possibility that the doubling is platform specific causes fragility concerns. Plus who knows what extra data is winding up in the audio.

UCABJDP · 2020-08-26T17:50:27Z

#40 May be the root cause here, piping audio out of sox forces the format to be 32 bit audio, which may gives appearance of it generating double the data when set to 16 bit.

lucasctd mentioned this issue Jul 9, 2019

Continuous Recognition from Microphone microsoft/cognitive-services-speech-sdk-js#72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mic and Azure Speech to Text #35

Mic and Azure Speech to Text #35

lucasctd commented Jun 29, 2019

rhurey commented Jul 25, 2019

UCABJDP commented Aug 26, 2020

Mic and Azure Speech to Text #35

Mic and Azure Speech to Text #35

Comments

lucasctd commented Jun 29, 2019

rhurey commented Jul 25, 2019

UCABJDP commented Aug 26, 2020