From 0846a0dcbce37c729236f5a989d33b147cb76365 Mon Sep 17 00:00:00 2001 From: devilran6 <3470826156@qq.com> Date: Mon, 26 Aug 2024 15:15:32 +0800 Subject: [PATCH] update --- index.html | 38 +++++++++++++++++--------------------- 1 file changed, 17 insertions(+), 21 deletions(-) diff --git a/index.html b/index.html index a9f7817..3f3fdf3 100644 --- a/index.html +++ b/index.html @@ -131,11 +131,7 @@

Abstract

VC-S2E(Our) : We propose a new model that leverages audio-visual modalities to improve speech quality and intelligibility. -
  • VC-S2E (w/o Ea, Ev): A version of the model with only - audio input. -
        Here, Ea and Ev represent Scenario-Aware Audio - Embedding and Visual Embedding, respectively. w/o means without. -
  • +
  • Conformer: (TO ADD)
  • Noisy video: The video source of noise, used as our second modality input source.
  • Gradcam: Displays the middle frame of the video with the corresponding Grad-CAM heatmap, highlighting key noise areas identified by the video encoder.
  • @@ -166,6 +162,21 @@

    Demos

    + +
    + +
    Noisy video +
    + + +
    + Gradcam Image +
    Gradcam +
    +
    -
    VC-S\(^{2}\)E(w/o Ea, Ev) -
    - - -
    - -
    Noisy video -
    - - -
    - Gradcam Image -
    Gradcam +
    Conformer