Skip to content

Commit

Permalink
add conformer
Browse files Browse the repository at this point in the history
  • Loading branch information
devilran6 committed Aug 28, 2024
1 parent 13c8444 commit e8f4d1c
Showing 1 changed file with 146 additions and 76 deletions.
222 changes: 146 additions & 76 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,13 @@
margin-bottom: 5px;
/* Adjust spacing as needed */
}

.left-align ul,
.left-align ol {
text-align: left;
margin-left: 40px;
/* Adjust left margin as needed */
}
</style>
</head>

Expand Down Expand Up @@ -194,20 +201,23 @@ <h2 class="title is-2 has-text-centered">Abstract</h2>
<hr>
</div>

<ul>
<li><b>Noisy</b>: Displays the spectrogram of audio generated by mixing clean speech and noise audio.</li>
<li><b>Clean</b>: Clean speech, serving as the source for mixing noisy audio and as the ground truth for comparison
after training.</li>
<li>
<b>
<font color=#0000FF>VC-S<sup>2</sup>E(Our)</font>
</b>: We propose a new model that leverages audio-visual modalities to improve speech quality and intelligibility.
</li>
<li><b>Conformer</b>: (TO ADD)</li>
<li><b>Noisy video</b>: The video source of noise, used as our second modality input source.</li>
<li><b>Gradcam</b>: Displays the middle frame of the video with the corresponding Grad-CAM heatmap, highlighting key
noise areas identified by the video encoder.</li>
</ul>
<!-- 修改这里:添加了 left-align 类 -->
<div class="markdown left-align">
<ul>
<li><b>Noisy</b>: Displays the spectrogram of audio generated by mixing clean speech and noise audio.</li>
<li><b>Clean</b>: Clean speech, serving as the source for mixing noisy audio and as the ground truth for
comparison after training.</li>
<li><b>
<font color=#0000FF>VC-S<sup>2</sup>E(Our)</font>
</b>: We propose a new model that leverages audio-visual modalities to improve speech quality and
intelligibility.</li>
<li><b>Conformer</b>: A hybrid model that combines convolutional neural networks and transformers, specifically
designed for speech recognition and related tasks.</li>
<li><b>Noisy video</b>: The video source of noise, used as our second modality input source.</li>
<li><b>Gradcam</b>: Displays the middle frame of the video with the corresponding Grad-CAM heatmap, highlighting
key noise areas identified by the video encoder.</li>
</ul>
</div>

<div class="section" id="demos">
<div class="container">
Expand Down Expand Up @@ -253,98 +263,158 @@ <h2 class="title is-2 has-text-centered">Demos</h2>
</div>

<div class="media-row">
<div class="media-item"><video controls>
<div class="media-item">
<video controls>
<source src="new_noise_video/1.mp4" type="video/mp4">
</video></div>
<div class="media-item"><img src="gradcam/1.jpg" alt="1"></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<img src="gradcam/1.jpg" alt="1">
</div>
<div class="media-item">
<video controls>
<source src="noisy_speech/1.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="clean_speech/1.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_only/1.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_video/1.mp4" type="video/mp4">
</video></div>
</video>
</div>
</div>

<div class="media-row">
<div class="media-item"><video controls>
<div class="media-item">
<video controls>
<source src="new_noise_video/2.mp4" type="video/mp4">
</video></div>
<div class="media-item"><img src="gradcam/2.jpg" alt="2"></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<img src="gradcam/2.jpg" alt="2">
</div>
<div class="media-item">
<video controls>
<source src="noisy_speech/2.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="clean_speech/2.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_only/2.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_video/2.mp4" type="video/mp4">
</video></div>
</video>
</div>
</div>

<div class="media-row">
<div class="media-item"><video controls>
<div class="media-item">
<video controls>
<source src="new_noise_video/3.mp4" type="video/mp4">
</video></div>
<div class="media-item"><img src="gradcam/3.jpg" alt="3"></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<img src="gradcam/3.jpg" alt="3">
</div>
<div class="media-item">
<video controls>
<source src="noisy_speech/3.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="clean_speech/3.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_only/3.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_video/3.mp4" type="video/mp4">
</video></div>
</video>
</div>
</div>

<div class="media-row">
<div class="media-item"><video controls>
<div class="media-item">
<video controls>
<source src="new_noise_video/4.mp4" type="video/mp4">
</video></div>
<div class="media-item"><img src="gradcam/4.jpg" alt="4"></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<img src="gradcam/4.jpg" alt="4">
</div>
<div class="media-item">
<video controls>
<source src="noisy_speech/4.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="clean_speech/4.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_only/4.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_video/4.mp4" type="video/mp4">
</video></div>
</video>
</div>
</div>

<div class="media-row">
<div class="media-item"><video controls>
<div class="media-item">
<video controls>
<source src="new_noise_video/5.mp4" type="video/mp4">
</video></div>
<div class="media-item"><img src="gradcam/5.jpg" alt="5"></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<img src="gradcam/5.jpg" alt="5">
</div>
<div class="media-item">
<video controls>
<source src="noisy_speech/5.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="clean_speech/5.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_only/5.mp4" type="video/mp4">
</video></div>
<div class="media-item"><video controls>
</video>
</div>
<div class="media-item">
<video controls>
<source src="enhanced_audio_video/5.mp4" type="video/mp4">
</video></div>
</video>
</div>
</div>

<!-- Image Switcher Section -->
Expand Down Expand Up @@ -380,13 +450,13 @@ <h2 class="title is-2 has-text-centered">Demos</h2>
<div class="container">
<hr>
</div>
<h2 class="title is-2 has-text-centered">References</h2>
<p><br>
<ol>
<li>conformer?
</li>
</ol>
</p>
<div class="markdown left-align">
<h2 class="title is-2 has-text-centered">References</h2>
<ol>
<li>Gulati, Anmol, et al. "Conformer: Convolution-augmented transformer for speech recognition." arXiv preprint
arXiv:2005.08100 (2020).</li>
</ol>
</div>
<br>

<script>
Expand Down

0 comments on commit e8f4d1c

Please sign in to comment.