update

AVSEPage · Aug 28, 2024 · b49636a · b49636a
1 parent 7e0c192
commit b49636a
Showing 1 changed file with 14 additions and 11 deletions.
diff --git a/index.html b/index.html
@@ -206,18 +206,17 @@ <h2 class="title is-2 has-text-centered">Abstract</h2>
   <!-- 修改这里：添加了 left-align 类 -->
   <div class="markdown left-align">
     <ul>
-      <li><b>Noisy</b>: Displays the spectrogram of audio generated by mixing clean speech and noise audio.</li>
-      <li><b>Clean</b>: Clean speech, serving as the source for mixing noisy audio and as the ground truth for
-        comparison after training.</li>
+      <li><b>Noisy Video</b>: The video source that contains noise and serves as the second modality input.</li>
+      <li><b>Grad-CAM Image</b>: Displays the middle frame of the video overlaid with the Grad-CAM heatmap, highlighting
+        key noise areas identified by the video encoder.</li>
+      <li><b>Noisy Speech</b>: Shows the spectrogram of audio generated by mixing clean speech with noise.</li>
+      <li><b>Clean Speech</b>: Shows the spectrogram of audio from clean speech.</li>
+      <li><b>Conformer</b>: A hybrid model that combines convolutional neural networks and transformers, designed for
+        speech recognition and related tasks.</li>
       <li><b>
-          <font color=#0000FF>VC-S<sup>2</sup>E(Our)</font>
-        </b>: We propose a new model that leverages audio-visual modalities to improve speech quality and
+          <font color=#0000FF>VC-S<sup>2</sup>E (Ours)</font>
+        </b>: We propose a novel model that leverages audio-visual modalities to enhance speech quality and
         intelligibility.</li>
-      <li><b>Conformer</b>: A hybrid model that combines convolutional neural networks and transformers, specifically
-        designed for speech recognition and related tasks.</li>
-      <li><b>Noisy video</b>: The video source of noise, used as our second modality input source.</li>
-      <li><b>Gradcam</b>: Displays the middle frame of the video with the corresponding Grad-CAM heatmap, highlighting
-        key noise areas identified by the video encoder.</li>
     </ul>
   </div>
 
@@ -235,7 +234,7 @@ <h2 class="title is-2 has-text-centered">Demos</h2>
             </video>
           </div>
           <div class="media-item">
-            <div class="title-item">Gradcam Image</div>
+            <div class="title-item">Grad-CAM Image</div>
             <img src="gradcam/0.jpg" alt="0">
           </div>
           <div class="media-item">
@@ -421,6 +420,9 @@ <h2 class="title is-2 has-text-centered">Demos</h2>
           </div>
         </div>
 
+        <div class="text-below-image">
+          <p>These rows display the spectrograms of different audio samples.</p>
+        </div>
         <!-- Image Switcher Section -->
         <div class="image-row" id="imageRow1">
           <img src="noisy_speech/0_spectrum.png" alt="0">
@@ -434,6 +436,7 @@ <h2 class="title is-2 has-text-centered">Demos</h2>
           <img src="noisy_speech/5_spectrum.png" alt="5">
         </div>
 
+
         <div class="buttons">
           <button onclick="switchImages('noisy_speech', 'Noisy speech')">Noisy speech</button>
           <button onclick="switchImages('clean_speech', 'Clean speech')">Clean speech</button>