From f51d85a607572a4677c8cdd73561c8da9b7a1160 Mon Sep 17 00:00:00 2001 From: devilran6 <3470826156@qq.com> Date: Mon, 26 Aug 2024 00:32:34 +0800 Subject: [PATCH] debug math formulate --- index.html | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/index.html b/index.html index 81117e5..1a57c08 100644 --- a/index.html +++ b/index.html @@ -55,23 +55,24 @@ <h2 class="title is-2 has-text-centered">Abstract</h2> </div> <div class="markdown has-text-centered"> <div align="left" style="font-size:20px; text-align:justify"> - Speech enhancement plays an essential role in various applications, and the integration of visual information - has been demonstrated to bring substantial advantages. - However, existing works mainly focus on the analysis of facial and lip movements, whereas contextual visual - cues from the surrounding environment have been overlooked: - for example, when we see a dog bark, our brain has the innate ability to discern and filter out the barking - noise. - To this end, in this paper, we introduce a novel task, i.e. Scene-aware Audio-Visual Speech Enhancement. - To our best knowledge, this is the first proposal to use rich contextual information from synchronized video - as auxiliary cues to indicate the type of noise, - which eventually improves the speech enhancement performance. - Specifically, we propose the VC-S$^2$E method, which incorporates the Conformer and Mamba modules for their - complementary strengths. - Extensive experiments are conducted on public MUSIC, AVSpeech and AudioSet datasets, where the results - demonstrate the superiority of VC-S$^2$E over other competitive methods. - We will make the source code publicly available. - Project demo page: https://AVSEPage.github.io/ - <br> + <p>Speech enhancement plays an essential role in various applications, and the integration of visual + information + has been demonstrated to bring substantial advantages. + However, existing works mainly focus on the analysis of facial and lip movements, whereas contextual visual + cues from the surrounding environment have been overlooked: + for example, when we see a dog bark, our brain has the innate ability to discern and filter out the barking + noise. + To this end, in this paper, we introduce a novel task, i.e. Scene-aware Audio-Visual Speech Enhancement. + To our best knowledge, this is the first proposal to use rich contextual information from synchronized video + as auxiliary cues to indicate the type of noise, + which eventually improves the speech enhancement performance. + Specifically, we propose the \[ VC-S^{2}E \] method, which incorporates the Conformer and Mamba modules for + their + complementary strengths. + Extensive experiments are conducted on public MUSIC, AVSpeech and AudioSet datasets, where the results + demonstrate the superiority of \[ VC-S^{2}E \] over other competitive methods. + We will make the source code publicly available. + Project demo page: https://AVSEPage.github.io/</p> </div> </div> </div>