Normalize and chunk audio inputs in wav2vec2 example #400

Improve the wav2vec2 example by: - Normalizing audio samples to unit mean and variance, as described in the wav2vec2 paper and linked issues. In the original fairseq source the PyTorch `layer_norm` function is used, which does the same thing. - Chunking the audio to improve performance when transcribing long audio inputs. - Improving the readability of the output by lower-casing the raw char predictions and replacing "|" with spaces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize and chunk audio inputs in wav2vec2 example #400

Normalize and chunk audio inputs in wav2vec2 example #400

Commits on Nov 7, 2024