Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问有C#的实现吗 #3

Open
dfengpo opened this issue Dec 27, 2024 · 1 comment
Open

请问有C#的实现吗 #3

dfengpo opened this issue Dec 27, 2024 · 1 comment

Comments

@dfengpo
Copy link

dfengpo commented Dec 27, 2024

iic/speech_zipenhancer_ans_multiloss_16k_base这个模型的降噪效果还不错
请问有C#的实现吗,或者有没有实现思路?
我自己用C#onnxruntime调用model.onnx(https://www.modelscope.cn/models/iic/speech_zipenhancer_ans_multiloss_16k_base/file/view/master?fileName=README.md&status=1)这里下载下来的模型,提示需要分配超大内存,请问你python是怎么解决的
[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running MatMul node. Name:'/model/TSConformer/0/0/self_attn_weights_1/MatMul' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 515293323776,

@DakeQQ
Copy link
Owner

DakeQQ commented Dec 27, 2024

我们使用静态输入形状,例如一个5120的int16数组,它对应320毫秒的音频样本。这个过程涉及将原始音频切片成320毫秒的段,并将其传递到芯片。使用并行计算是可选的。

音频切片部分代码

if audio_len > INPUT_AUDIO_LENGTH:
    if (shape_value_in != shape_value_out) & isinstance(shape_value_in, int) & isinstance(shape_value_out, int):
        stride_step = shape_value_out
    num_windows = int(np.ceil((audio_len - INPUT_AUDIO_LENGTH) / stride_step)) + 1
    total_length_needed = (num_windows - 1) * stride_step + INPUT_AUDIO_LENGTH
    pad_amount = total_length_needed - audio_len
    final_slice = audio[:, :, -pad_amount:]
    white_noise = (np.sqrt(np.mean(final_slice * final_slice)) * np.random.normal(loc=0.0, scale=1.0, size=(1, 1, pad_amount))).astype(audio.dtype)
    audio = np.concatenate((audio, white_noise), axis=-1)
elif audio_len < INPUT_AUDIO_LENGTH:
    white_noise = (np.sqrt(np.mean(audio * audio)) * np.random.normal(loc=0.0, scale=1.0, size=(1, 1, INPUT_AUDIO_LENGTH - audio_len))).astype(audio.dtype)
    audio = np.concatenate((audio, white_noise), axis=-1)

aligned_len = audio.shape[-1]

说明

  • 如果音频长度超过静态输入形状,代码会对音频进行整数倍填充,然后再切片。
  • 如果音频长度小于静态输入形状,则用白噪声填充音频。
  • 这就是我们仓库中的演示工作方式。

注意

我们对C#不太熟悉,因此对于从Python到C#的代码转换,您可以考虑使用GPT或其他代码转换工具。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants