Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAD 流式推理的问题 #2228

Open
TungyuYoung opened this issue Nov 22, 2024 · 0 comments
Open

VAD 流式推理的问题 #2228

TungyuYoung opened this issue Nov 22, 2024 · 0 comments
Labels
question Further information is requested

Comments

@TungyuYoung
Copy link

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

参考 huggingface上的代码,会存在一个问题,即当前的chunk进到 vad 模型中,得到了 value 的结果,比如说拿到了 start 的结果,但是这个 start 的时间点是位于前 3 到 4 个 chunks 里面的。请问有什么方法可以改进吗?我期待当前的 chunk 进去,就会得到当前输入的 chunk 的状态,即整个 chunk 都没有人说话或者有人说话,或者是可以拿到 start / end 的时间戳。

我尝试了流式处理的时候将 model.generate 中的 is_final 设置为 True 来处理,但是效果对比非实时处理要差。请问是否存在更好的解决方案呢?

万分感谢!

from funasr import AutoModel
chunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

import soundfile

wav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)
    if len(res[0]["value"]):
        print(res)

@TungyuYoung TungyuYoung added the question Further information is requested label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant