We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keyword: fsmn-vad, ct-punc, cam++, is_final
step1: 复制readme.md中Speech Recognition (Streaming) 这一节的代码, 其中 model = AutoModel(model="paraformer-zh-streaming", 运行正常 (完整代码和readme.md相同, 贴在最后一部分方便阅读)
model = AutoModel(model="paraformer-zh-streaming"
step2: 将示例代码的model修改为:
model = AutoModel(model="paraformer-zh-streaming", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", ) ... model.generate(input=speech_chunk, cache=cache, is_final=is_final) # 只保留这3个参数
运行结果:
所有返回均为空 (例如: [{'key': 'rand_key_2yW4Acq9GFz6Y', 'text': '', 'timestamp': []}] 这样)
step3: 将示例代码的model修改为:
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", ) ... model.generate(input=speech_chunk, cache=cache, is_final=is_final) # 只保留这3个参数
运行结果: 和2相比, 前面返回均为空, 但是最后一个片段时, 有is_final=True, 返回了对应片段的字符, 例如 [{'key': 'rand_key_2yW4Acq9GFz6Y', 'text': '模型', 'timestamp': ...}] step4: 在3的基础上, 将model.generate改为model.generate(input=speech_chunk, cache=cache, is_final=True) 运行结果: 各分片运行正常, 但是由于总是设置is_final=True, 不能达到流式处理拼接对话和说话人区分的需求
model.generate(input=speech_chunk, cache=cache, is_final=True)
pip
为什么is_final=False时无法识别? 哪位可以给一个包含了vad_model, pucn_model and spk_model且可以进行流式处理的代码示例? 非常感谢!
附原始完整代码
from funasr import AutoModel chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention model = AutoModel(model="paraformer-zh-streaming", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", ) import soundfile import os wav_file = os.path.join(model.model_path, "example/asr_example.wav") speech, sample_rate = soundfile.read(wav_file) chunk_stride = chunk_size[1] * 960 # 600ms cache = {} total_chunk_num = int(len((speech)-1)/chunk_stride+1) for i in range(total_chunk_num): speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride] is_final = i == total_chunk_num - 1 res = model.generate(input=speech_chunk, cache=cache, is_final=is_final) print(res)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
仿照readme.md尝试加载多个模型流式处理, 遇到问题
keyword: fsmn-vad, ct-punc, cam++, is_final
What have you tried?
step1: 复制readme.md中Speech Recognition (Streaming) 这一节的代码, 其中
model = AutoModel(model="paraformer-zh-streaming"
, 运行正常(完整代码和readme.md相同, 贴在最后一部分方便阅读)
step2: 将示例代码的model修改为:
运行结果:
step3: 将示例代码的model修改为:
运行结果:
和2相比, 前面返回均为空, 但是最后一个片段时, 有is_final=True, 返回了对应片段的字符, 例如 [{'key': 'rand_key_2yW4Acq9GFz6Y', 'text': '模型', 'timestamp': ...}]
step4: 在3的基础上, 将model.generate改为
model.generate(input=speech_chunk, cache=cache, is_final=True)
运行结果:
各分片运行正常, 但是由于总是设置is_final=True, 不能达到流式处理拼接对话和说话人区分的需求
What's your environment?
pip
, source): pipMy Question:
为什么is_final=False时无法识别? 哪位可以给一个包含了vad_model, pucn_model and spk_model且可以进行流式处理的代码示例? 非常感谢!
附原始完整代码
The text was updated successfully, but these errors were encountered: