From 107173a5b5734543d0579a62064649884e643cc0 Mon Sep 17 00:00:00 2001 From: HannahBenita <77296142+HannahBenita@users.noreply.github.com> Date: Fri, 1 Dec 2023 16:36:27 +0100 Subject: [PATCH] Update index.html --- index.html | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/index.html b/index.html index c16f51e..d1dacde 100644 --- a/index.html +++ b/index.html @@ -246,8 +246,9 @@

Evaluation

+

In the following section we provide a concise overview of the quantitative and qualitative evaluation of MultiFusion.

Image Fidelity and Text-to-Image Alignment

-

We meassure image fidelity and image-text-alignment using the standard metrics FID-30K and Clip Scores. We find that MultiFusion prompted with text only performs on par with Stable Diffusion despite extension of the Encoder to support multiple languages and modalities.

+

First we meassure image fidelity and image-text-alignment using the standard metrics FID-30K and Clip Scores. We find that MultiFusion prompted with text only performs on par with Stable Diffusion despite extension of the Encoder to support multiple languages and modalities.


Compositional Robustness

@@ -255,7 +256,8 @@

Compositional Robustness

method
-

Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark MCC-250 we show that multimodal prompting leads to more compositional robustness as judged by humans.

+

Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark MCC-250 we show that multimodal prompting leads to more compositional robustness as judged by humans. Each prompt is a complex conjunction of two different objects with different + colors, with multimodal prompts containing one visual reference for each object interleaved with thetext input.

Multilinguality