Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems to use vad_model, pucn_model and spk_model with streaming voice. 如何正常在流式处理中加载这3个模型? #2231

Open
1113200320 opened this issue Nov 25, 2024 · 0 comments
Labels
question Further information is requested

Comments

@1113200320
Copy link

仿照readme.md尝试加载多个模型流式处理, 遇到问题

keyword: fsmn-vad, ct-punc, cam++, is_final

What have you tried?

step1: 复制readme.md中Speech Recognition (Streaming) 这一节的代码, 其中 model = AutoModel(model="paraformer-zh-streaming", 运行正常
(完整代码和readme.md相同, 贴在最后一部分方便阅读)

step2: 将示例代码的model修改为:

model = AutoModel(model="paraformer-zh-streaming",
                  vad_model="fsmn-vad",  
                  punc_model="ct-punc", 
                  spk_model="cam++",
                  )
...
model.generate(input=speech_chunk, cache=cache, is_final=is_final) # 只保留这3个参数

运行结果:

所有返回均为空 (例如: [{'key': 'rand_key_2yW4Acq9GFz6Y', 'text': '', 'timestamp': []}] 这样)

step3: 将示例代码的model修改为:

model = AutoModel(model="paraformer-zh",
                  vad_model="fsmn-vad",  
                  punc_model="ct-punc", 
                  spk_model="cam++",
                  )
...
model.generate(input=speech_chunk, cache=cache, is_final=is_final) # 只保留这3个参数

运行结果:
和2相比, 前面返回均为空, 但是最后一个片段时, 有is_final=True, 返回了对应片段的字符, 例如 [{'key': 'rand_key_2yW4Acq9GFz6Y', 'text': '模型', 'timestamp': ...}]
step4: 在3的基础上, 将model.generate改为model.generate(input=speech_chunk, cache=cache, is_final=True)
运行结果:
各分片运行正常, 但是由于总是设置is_final=True, 不能达到流式处理拼接对话和说话人区分的需求

What's your environment?

  • OS Windows11, dont't use docker
  • PyTorch Version (e.g., 2.0.0): 2.5.1
  • How you installed funasr (pip, source): pip
  • Python version: 3.12.3

My Question:

为什么is_final=False时无法识别? 哪位可以给一个包含了vad_model, pucn_model and spk_model且可以进行流式处理的代码示例? 非常感谢!


附原始完整代码

from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming",
                  vad_model="fsmn-vad",  
                  punc_model="ct-punc", 
                  spk_model="cam++",
                  )

import soundfile
import os

wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final)
    print(res)
@1113200320 1113200320 added the question Further information is requested label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant