diff --git a/.gitignore b/.gitignore
index 8b13789..ee28272 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,2 @@
+*.DS_Store
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6b97a88
--- /dev/null
+++ b/README.md
@@ -0,0 +1,7 @@
+# [Motion Consistency Model](https://yhzhai.github.io/mcm/)
+
+## Acknowledgments
+Parts of this project page were adopted from the [Nerfies](https://nerfies.github.io/) page.
+
+## Website License
+
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
diff --git a/index.html b/index.html
index a0026e5..df2d743 100644
--- a/index.html
+++ b/index.html
@@ -1,4 +1,599 @@
-
+ TL;DR: Our motion consistency model not only accelerates text2video diffusion model sampling process, but also can + benefit from an additional high-quality image dataset to improve the frame quality of generated videos. +
++ | Teacher (ModelScopeT2V) 50 steps |
+ Ours+Webvid 4 steps |
+ Ours+LAION-aesthetic 4 steps |
+ Ours+Anime 4 steps |
+ Ours+Realistic 4 steps |
+ Ours+3D Cartoon 4 steps |
+
---|---|---|---|---|---|---|
+ Aerial uhd 4k view. mid-air flight over fresh and clean mountain river at sunny summer morning. Green trees and sun + rays on horizon. Direct on sun. + | ++ + | ++ + | ++ + | ++ + | ++ + | ++ + | +
+ Back of woman in shorts going near pure creek in beautiful mountains. + | ++ + | ++ + | ++ + | ++ + | ++ + | ++ + | +
+ A rotating pandoro (a traditional italian sweet yeast bread, most popular around christmas and new year) being eaten in + time-lapse. + | ++ + | ++ + | ++ + | ++ + | ++ + | ++ + | +
+ Slow motion avocado with a stone falls and breaks into 2 parts with splashes + | ++ + | ++ + | ++ + | ++ + | ++ + | ++ + | +
*For anime, realistic, and 3D cartoon styles, we leverage generated 500k image-caption datasets using fine-tuned stable diffusion models ToonYou beta 6, RealisticVision v6, and Disney pixar cartoon, respectively.
++ Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, directly applying + these techniques to video models results in unsatisfied frame quality. This issue arises from the limited frame + appearance quality in public video datasets, affecting the performance of both teacher and student video diffusion + models. Our study aims to improve video diffusion distillation and meanwhile enabling the student model to improve frame + appearance using the abundant high-quality image data. To this end, we propose motion consistency models (MCM), a + single-stage video diffusion distillation method that disentangles motion and appearance learning. Specifically, MCM + involves a video consistency model that distills motion from the video teacher model, and an image discriminator that + boosts frame appearance to match high-quality image data. However, directly combining these components leads to two + significant challenges: a conflict in frame learning objectives, where video distillation learns from low-quality video + frames while the image discriminator targets high-quality images, and training-inference discrepancies due to the + differing quality of video samples used during training and inference. To address these challenges, we introduce + disentangled motion distillation and mixed trajectory distillation. The former applies the distillation objective solely + to the motion representation, while the latter mitigates training-inference discrepancies by mixing distillation + trajectories from both the low- and high-quality video domains. Extensive experiments show that our MCM achieves + state-of-the-art video diffusion distillation performance. Additionally, our method can enhance frame quality in video + diffusion models, producing frames with high aesthetic value or specific styles. +
++ Our motion consistency model not only distill the motion prior + from the teacher to accelerate sampling, but also can benefit + from an additional high-quality image dataset to improve the frame + quality of generated videos. +
++ Left: framework overview. Our motion consistency model features disentangled motion-appearance distillation, where motion is + learned via the motion consistency distillation loss \(\mathcal{L}_{\text{MCD}}\), and the appearance is learned with the + frame adversarial objective \(\mathcal{L}_{\text{adv}}^{\text{G}}\). +
++ Right: mixed trajectory distillation. We + simulate the inference-time ODE trajectory using student-generated video (bottom green line), + which is mixed with the real video ODE trajectory (top green line) for consistency distillation + training. +
+@article{zhai2024motion,
+ title={Motion Consistency Model: Accelerating Video Diffusion with Disentangled
+ Motion-Appearance Distillation},
+ author={Zhai, Yuanhao and Lin, Kevin and Yang, Zhengyuan and Li, Linjie and Wang, Jianfeng and Lin, Chung-Ching and Doermann, David and Yuan, Junsong and Wang, Lijuan},
+ year={2024},
+ website={https://yhzhai.github.io/mcm/},
+}
+