index.html

<!DOCTYPE html>
<html lang="en">
	<head>
		<meta charset="utf-8" />
		<script type="text/javascript" src="//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
		<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
		<title>UP-NeRF</title>
		<!-- <link rel="icon" type="image/x-icon" href="/imgs/nope1.png"> -->
		<!-- Bootstrap core CSS -->
		
		<!-- Custom styles for this template -->
		<link href="assets/css/academicons.css" rel="stylesheet">
		<link href="assets/css/footer.css" rel="stylesheet">
		<link href="assets/css/style.css" rel="stylesheet">
		<link rel="stylesheet" href="assets/original/dics.original.css">
		<link rel='stylesheet' href="https://cdn.jsdelivr.net/npm/bulma@0.9.3/css/bulma.min.css">
    	<script src="assets/original/dics.original.js"></script>
		<link
		  rel="stylesheet" 
		  href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" 
		  integrity="sha384-JcKb8q3iqJ61gNV9KGb8thSsNjpSL0n8PARn9HuZOnIxN0hoP+VmmDGMN5t9UJ0Z" 
		  crossorigin="anonymous">
		<!-- Custom fonts for this template -->
		<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
		
		<script>
			document.addEventListener('DOMContentLoaded', domReady);
	
			function domReady() {
				new Dics({
					container: document.querySelectorAll('.b-dics')[0],
					textPosition: 'top'
				});
	
				new Dics({
					container: document.querySelectorAll('.b-dics')[1],
					hideTexts: true,
					textPosition: 'center'
				});
	
				new Dics({
					container: document.querySelectorAll('.b-dics')[2]
				});
	
				new Dics({
					container: document.querySelectorAll('.b-dics')[3],
					linesOrientation: 'vertical',
					textPosition: 'left',
					arrayBackgroundColorText: ['#000000', '#FFFFFF'],
					arrayColorText: ['#FFFFFF', '#000000'],
					linesColor: 'rgb(0,0,0)'
				});
	
				new Dics({
					container: document.querySelectorAll('.b-dics')[4],
					linesOrientation: 'vertical',
					textPosition: 'right'
				});
	
				new Dics({
					container: document.querySelectorAll('.b-dics')[5],
					textPosition: 'bottom'
				});
	
				new Dics({
					container: document.querySelectorAll('.b-dics')[6],
					filters: ['blur(3px)', 'grayscale(1)', 'sepia(1)', 'saturate(3)']
				});
				new Dics({
					container: document.querySelectorAll('.b-dics')[7],
					rotate: '45deg'
				});
			}
		</script>
	</head>
<body>

<main role="main" class="container", style="max-width: 1000px;">
	    <h1 class="title is-3 publication-title">UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields</h1>
	<!-- &#x1F6AB &#129306 &#128581 &#128581;&#8205;&#9792;&#65039 &#x1F6AB-->
	<div class="col text-center">
	    <p class="authors">
		  Injae Kim,
	      Minhyuk Choi,
		  <a href="https://pages.cs.wisc.edu/~hwkim/">Hyunwoo J.Kim</a><br>
	      <a href="https://mlv.korea.ac.kr/home"> MLV Lab</a>,
	      Korea University<br>
		  NeurIPS 2023
	    </p>
	</div>
	<div class="col text-center", style="padding: 20px 0px;">
	    <a class="btn btn-dark", href="https://arxiv.org/abs/2311.03784" role="button">Arxiv</a>
		<a class="btn btn-dark" href="#bibtex" role="button">Bibtex</a>
		<a class="btn btn-dark" href="https://github.com/mlvlab/UP-NeRF" role="button">Code</a>
		<!-- <button type="button" class="btn btn-dark" disabled>Code</button> -->
	</div>
	<img style="margin-bottom: 30px;" src="assets/imgs/qualitative.png" alt="comparison">
	<div style="display: inline-block; padding-left: 10%;">
		<div style="float:left; width:25%; padding-top: 20px;">
			<h5 style="text-align: center;">GT RGB</h5>
			<img style="display:inline; margin-bottom: 20px;" src="assets/imgs/main_gt.jpg">
			<h5 style="text-align: center;">GT Feature</h5>
			<img style="display:inline" src="assets/imgs/main_gt_feature.png">
		</div>
		<div style="float:left; width:3%; border-left: 3px solid black;height: 420px; margin-left:22px"></div>
		<div style="float:left; width:60%; text-align: center; font-size: 13px;">
			<div style="display:inline-block; margin-bottom: 15px;">
				<h5 style="text-align: center;">Visualization of BARF</h5>
				<video id="video1" autoplay muted width="650px" style="float:left">
					<source src="assets/video/barf.webm"></source>
				</video>
			</div>
			<div style="display:inline-block; margin-bottom: 10px;">
				<h5 style="text-align: center;">Visualization of UP-NeRF</h5>
				<video id="video2" autoplay muted width="650px" style="float:left">
					<source src="assets/video/ours.webm"></source>
				</video>
			</div>
			Colored frustum and black frustum each represents predicted pose and GT pose
		</div>
	</div>
	<hr>
	<h3>Abstract</h3>
	  <p>
		Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. 
		Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. 
		However, these works are limited to relatively simple settings such as photometric consistent, occluder-free image collections with restricted camera poses or a sequence of images from a video. 
		So they cannot handle unconstrained images with varying illumination and transient occluders. 
		In this paper, we propose UP-NeRF (Unconstrained Pose-prior-free Neural Radiance Field) to optimize NeRF with unconstrained image collections without camera pose prior. 
		We tackle these challenges with surrogate tasks which optimize color-insensitive feature fields and a separate module for transient occluders to block their influence on pose estimation. 
		In addition, we introduce a candidate head to enable more robust pose estimation and transient-aware depth supervision to minimize the effect of incorrect prior. 
		Our experiments verify the superior performance of our method in challenging settings to the baselines including BARF and its variants in the internet photo collection Phototourism dataset.
	  </p>
	<hr>
	<h3>Method Overview</h3>
	<div style="padding: 20px 0px;", class="center">
		<img src="assets/imgs/model.png" alt="Model Overview", class="center">
	</div>
	<hr>
	<h3>Candidate head</h3>
	<p>
		We visualize the progress of pose estimation between one with our candidate head and without it. 
		<b style="color: #d62728"> Red frustum </b> represents estimated poses and <b style="color: #000000"> Black frustum</b> represents ground-truth poses.
	</p>
	<div class="center"></div>
		<img src="assets/imgs/candidate_ablation.png" alt="Model Overview", class="center">
	</div>
	<div style="display: inline-block; margin-top: 30px; padding-left: 12%;">
		<div style="float:left; width:25%; padding-top: 60px;">
			<h5 style="text-align: center;">GT RGB</h5>
			<img style="display:inline; margin-bottom: 20px;" src="assets/imgs/candidate_gt.jpg">
			<h5 style="text-align: center;">GT Feature</h5>
			<img style="display:inline" src="assets/imgs/candidate_gt_feature.png">
		</div>
		<div style="float:left; width:3%; border-left: 3px solid black;height: 480px; margin-left:22px"></div>
		<div style="float:left; margin-left: 0px; width:60%">
			<div style="display:inline-block; margin-bottom: 15px;">
				<h5 style="text-align: center;">Without candidate head</h5>
				<video id="video3" autoplay muted width="500px" style="float:left">
					<source src="assets/video/nocandid.webm"></source>
				</video>
			</div>
			<div style="display:inline-block;">
				<h5 style="text-align: center;">With candidate head</h5>
				<video id="video4" autoplay muted width="500px" style="float:left">
					<source src="assets/video/candid.webm"></source>
				</video>
			</div>
		</div>
	</div>
	<hr>
	<h3>Transient-aware depth prior</h3>
	<p>
		We visualize the depth prior weight to show that our model can discriminate static and transient objects when imposing depth prior. The effect of depth prior becomes diluted as the weight increases (white area). 
	</p>
	<div style="padding: 0px;", class="center"></div>
		<img src="assets/imgs/depth.png" alt="Model Overview", class="center" style="width:50%">
	</div>
	<hr>
	<h3>Additional Scenes of Phototourism dataset</h3>
	<p>
		We think presenting additional Phototourism scenes other than 4 scenes can highlight the robustness of our model in the wild scenes. Thus, we pick several other scenes (<em>British museum, Lincoln Memorial statue, Pantheon Exterior, St. Paul’s Cathedral</em>) for additional experiments.
	 </p>
	<div style="padding: 0px;", class="center"></div>
		<img src="assets/imgs/additional.png" alt="Model Overview", class="center">
	</div>
	<hr>
	<h3>Full Video</h3>
	<div style="text-align:center;">
		<iframe width="560" height="315" src="https://www.youtube.com/embed/XqEfwh8eQFU?si=nFGQvto6_Hrxesh5" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
	</div>
	<hr>
	<h3>BibTeX</h3>
	<code class="codebox" id="bibtex">
		<pre>
			@inproceedings{kim2023upnerf,
				title={UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields},
				author={Kim, Injae and Choi, Minhyuk and Kim, Hyunwoo J},
				booktitle={Advances in Neural Information Processing Systems},
				year={2023}
			  }
		</pre>
	</code>
	</main>
	<script type="text/javascript">
		var videos = ["video1", "video2", "video3", "video4"];
		var listen = (id_)=>{
			document.getElementById(id_).playbackRate = 0.5;
			document.getElementById(id_).addEventListener('ended',(e)=>{
			setTimeout(function(){
				document.getElementById(id_).play();
			}, 3000);
			},false);
		}
		for (var i = 0; i < videos.length; i++) {
			var id_ = videos[i];
			listen(id_);
		}

	</script>
	</body>
</html>