diff --git a/index.html b/index.html
index e83c4fb..3bfed15 100644
--- a/index.html
+++ b/index.html
@@ -34,7 +34,12 @@ <h2 class="col-md-12 text-center">
             </h2>
         </div>
 
-        <style>      
+        <style> 
+	.img_container {
+		    display: flex;
+		    align-items: center;
+		    justify-content: center;
+		  }
             .author-info {
                 text-align: center;
                 margin-right: 40px; /* Adjust this to move the block to the right */
@@ -57,11 +62,12 @@ <h2 class="col-md-12 text-center">
             
             .author-name {
                 font-size: 20px; /* Adjust this value to increase or decrease the author name size */
-		color:#636666;
+		
             }
 
             .author-aff {
                 font-size: 10px; /* Adjust this value to increase or decrease the author name size */
+		color:#636666;
             }
             /* Medium screens (Tablets) */
             @media only screen and (max-width: 1024px) {
@@ -94,11 +100,6 @@ <h2 class="col-md-12 text-center">
                     font-size: 8px;
                 }
             }
-		.img_container {
-		    display: flex;
-		    align-items: center;
-		    justify-content: center;
-		  }
             </style>
             
             <div class="row">
@@ -252,11 +253,11 @@ <h4>Image Fidelity and Text-to-Image Alignment</h4>
 		<p>First we meassure image fidelity and image-text-alignment using the standard metrics FID-30K and Clip Scores. We find that MultiFusion prompted with text only performs on par with Stable Diffusion despite extension of the Encoder to support multiple languages and modalities.</p>
 		<image src="https://Aleph-Alpha.github.io/MultiFusion/src/imgs/evaluation.png" class="img-responsive"><br>
 		<h4>Compositional Robustness</h4>
-		<div class="img_container" style="margin-bottom:5px">
+		<div class="img_container" style="margin-bottom:20px">
 		<div float="left">
 			<image  height="150px" src="https://Aleph-Alpha.github.io/MultiFusion/src/imgs/compositional_robustness.png" alt="method"><br>
 		</div>
-		<div float="left">
+		<div float="left" style="margin-bottom:20px">
 			<p>Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark <a href="https://huggingface.co/datasets/AIML-TUDA/MCC-250">MCC-250</a> we show that multimodal prompting leads to more compositional robustness as judged by humans. Each prompt is a complex conjunction of two different objects with different
 			colors, with multimodal prompts containing one visual reference for each object interleaved with the text input.  </p>
 		</div>