-
-
Notifications
You must be signed in to change notification settings - Fork 398
/
Copy pathawesome_3dgs_papers.yaml
13968 lines (13858 loc) · 803 KB
/
awesome_3dgs_papers.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
- id: shin2025localityaware
title: Locality-aware Gaussian Compression for Fast and High-quality Rendering
authors: Seungjoo Shin, Jaesik Park, Sunghyun Cho
year: '2025'
abstract: 'We present LocoGS, a locality-aware 3D Gaussian Splatting (3DGS) framework
that exploits the spatial coherence of 3D Gaussians for compact modeling of volumetric
scenes. To this end, we first analyze the local coherence of 3D Gaussian attributes,
and propose a novel locality-aware 3D Gaussian representation that effectively
encodes locally-coherent Gaussian attributes using a neural field representation
with a minimal storage requirement. On top of the novel representation, LocoGS
is carefully designed with additional components such as dense initialization,
an adaptive spherical harmonics bandwidth scheme and different encoding schemes
for different Gaussian attributes to maximize compression performance. Experimental
results demonstrate that our approach outperforms the rendering quality of existing
compact Gaussian representations for representative real-world 3D datasets while
achieving from 54.6$\times$ to 96.6$\times$ compressed storage size and from 2.1$\times$
to 2.4$\times$ rendering speed than 3DGS. Even our approach also demonstrates
an averaged 2.4$\times$ higher rendering speed than the state-of-the-art compression
method with comparable compression performance.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05757.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/shin2025localityaware.jpg
publication_date: '2025-01-10T07:19:41+00:00'
date_source: arxiv
- id: meng2025zero1tog
title: 'Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation'
authors: Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie
Liu
year: '2025'
abstract: 'Recent advances in 2D image generation have achieved remarkable quality,largely
driven by the capacity of diffusion models and the availability of large-scale
datasets. However, direct 3D generation is still constrained by the scarcity and
lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel
approach that addresses this problem by enabling direct single-view generation
on Gaussian splats using pretrained 2D diffusion models. Our key insight is that
Gaussian splats, a 3D representation, can be decomposed into multi-view images
encoding different attributes. This reframes the challenging task of direct 3D
generation within a 2D diffusion framework, allowing us to leverage the rich priors
of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view
and cross-attribute attention layers, which capture complex correlations and enforce
3D consistency across generated splats. This makes Zero-1-to-G the first direct
image-to-3D generative model to effectively utilize pretrained 2D diffusion priors,
enabling efficient training and improved generalization to unseen objects. Extensive
experiments on both synthetic and in-the-wild datasets demonstrate superior performance
in 3D object generation, offering a new approach to high-quality 3D generation.
'
project_page: https://mengxuyigit.github.io/projects/zero-1-to-G/
paper: https://arxiv.org/pdf/2501.05427.pdf
code: null
video: null
tags:
- Diffusion
- Project
thumbnail: assets/thumbnails/meng2025zero1tog.jpg
publication_date: '2025-01-09T18:37:35+00:00'
date_source: arxiv
- id: gerogiannis2025arc2avatar
title: 'Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID
Guidance'
authors: Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros
Potamias, Alexandros Lattas, Stefanos Zafeiriou
year: '2025'
abstract: 'Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing
detailed 3D scenes within multi-view setups and the emergence of large 2D human
foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing
a human face foundation model as guidance with just a single image as input. To
achieve that, we extend such a model for diverse-view human head generation by
fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain
a dense correspondence with a human face mesh template, allowing blendshape-based
expression generation. This is achieved through a modified 3DGS approach, connectivity
regularizers, and a strategic initialization tailored for our task. Additionally,
we propose an optional efficient SDS-based correction step to refine the blendshape
expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar
achieves state-of-the-art realism and identity preservation, effectively addressing
color issues by allowing the use of very low guidance, enabled by our strong identity
prior and initialization strategy, without compromising detail.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05379.pdf
code: null
video: null
tags:
- Avatar
- Diffusion
thumbnail: assets/thumbnails/gerogiannis2025arc2avatar.jpg
publication_date: '2025-01-09T17:04:33+00:00'
date_source: arxiv
- id: tianci2025scaffoldslam
title: 'Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and
Photorealistic Mapping'
authors: Wen Tianci, Liu Zhiang, Lu Biao, Fang Yongchun
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has recently revolutionized novel view synthesis
in the Simultaneous Localization and Mapping (SLAM). However, existing SLAM methods
utilizing 3DGS have failed to provide high-quality novel view rendering for monocular,
stereo, and RGB-D cameras simultaneously. Notably, some methods perform well for
RGB-D cameras but suffer significant degradation in rendering quality for monocular
cameras. In this paper, we present Scaffold-SLAM, which delivers simultaneous
localization and high-quality photorealistic mapping across monocular, stereo,
and RGB-D cameras. We introduce two key innovations to achieve this state-of-the-art
visual quality. First, we propose Appearance-from-Motion embedding, enabling 3D
Gaussians to better model image appearance variations across different camera
poses. Second, we introduce a frequency regularization pyramid to guide the distribution
of Gaussians, allowing the model to effectively capture finer details in the scene.
Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that
Scaffold-SLAM significantly outperforms state-of-the-art methods in photorealistic
mapping quality, e.g., PSNR is 16.76% higher in the TUM RGB-D datasets for monocular
cameras.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05242.pdf
code: null
video: null
tags:
- SLAM
thumbnail: assets/thumbnails/tianci2025scaffoldslam.jpg
publication_date: '2025-01-09T13:50:26+00:00'
date_source: arxiv
- id: bond2025gaussianvideo
title: 'GaussianVideo: Efficient Video Representation via Hierarchical Gaussian
Splatting'
authors: Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem
year: '2025'
abstract: 'Efficient neural representations for dynamic video scenes are critical
for applications ranging from video compression to interactive simulations. Yet,
existing methods often face challenges related to high memory usage, lengthy training
times, and temporal consistency. To address these issues, we introduce a novel
neural video representation that combines 3D Gaussian splatting with continuous
camera motion modeling. By leveraging Neural ODEs, our approach learns smooth
camera trajectories while maintaining an explicit 3D scene representation through
Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy,
progressively refining spatial and temporal features to enhance reconstruction
quality and accelerate convergence. This memory-efficient approach achieves high-quality
rendering at impressive speeds. Experimental results show that our hierarchical
learning, combined with robust camera motion modeling, captures complex dynamic
scenes with strong temporal consistency, achieving state-of-the-art performance
across diverse video datasets in both high- and low-motion scenarios.
'
project_page: https://cyberiada.github.io/GaussianVideo/
paper: https://arxiv.org/pdf/2501.04782.pdf
code: null
video: null
tags:
- Project
- Video
thumbnail: assets/thumbnails/bond2025gaussianvideo.jpg
publication_date: '2025-01-08T19:01:12+00:00'
date_source: arxiv
- id: huang2025fatesgs
title: 'FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian
Splatting with Depth-Feature Consistency'
authors: Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu
year: '2025'
abstract: 'Recently, Gaussian Splatting has sparked a new trend in the field of
computer vision. Apart from novel view synthesis, it has also been extended to
the area of multi-view reconstruction. The latest methods facilitate complete,
detailed surface reconstruction while ensuring fast training speed. However, these
methods still require dense input views, and their output quality significantly
degrades with sparse views. We observed that the Gaussian primitives tend to overfit
the few training views, leading to noisy floaters and incomplete reconstruction
surfaces. In this paper, we present an innovative sparse-view reconstruction framework
that leverages intra-view depth and multi-view feature consistency to achieve
remarkably accurate surface reconstruction. Specifically, we utilize monocular
depth ranking information to supervise the consistency of depth distribution within
patches and employ a smoothness loss to enhance the continuity of the distribution.
To achieve finer surface reconstruction, we optimize the absolute position of
depth through multi-view projection features. Extensive experiments on DTU and
BlendedMVS demonstrate that our method outperforms state-of-the-art methods with
a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction
without the need for costly pre-training.
'
project_page: https://alvin528.github.io/FatesGS/
paper: https://arxiv.org/pdf/2501.04628.pdf
code: null
video: null
tags:
- Meshing
- Project
- Sparse
thumbnail: assets/thumbnails/huang2025fatesgs.jpg
publication_date: '2025-01-08T17:19:35+00:00'
date_source: arxiv
- id: kwak2025modecgs
title: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment
for Compact Dynamic 3D Gaussian Splatting'
authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh,
Munchurl Kim
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has made significant strides in scene representation
and neural rendering, with intense efforts focused on adapting it for dynamic
scenes. Despite delivering remarkable rendering quality and speed, existing methods
struggle with storage demands and representing complex real-world motions. To
tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting
framework designed for reconstructing novel views in challenging scenarios with
complex motions. We introduce GlobaltoLocal Motion Decomposition (GLMD) to effectively
capture dynamic motions in a coarsetofine manner. This approach leverages Global
Canonical Scaffolds (Global CS) and Local Canonical Scaffolds (Local CS), extending
static Scaffold representation to dynamic video reconstruction. For Global CS,
we propose Global Anchor Deformation (GAD) to efficiently represent global dynamics
along complex motions, by directly deforming the implicit Scaffold attributes
which are anchor position, offset, and local context features. Next, we finely
adjust local motions via the Local Gaussian Deformation (LGD) of Local CS explicitly.
Additionally, we introduce Temporal Interval Adjustment (TIA) to automatically
control the temporal coverage of each Local CS during training, allowing MoDecGS
to find optimal interval assignments based on the specified number of temporal
segments. Extensive evaluations demonstrate that MoDecGS achieves an average 70%
reduction in model size over stateoftheart methods for dynamic 3D Gaussians from
realworld dynamic videos while maintaining or even improving rendering quality.
'
project_page: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval
Adjustment for Compact Dynamic 3D Gaussian Splatting'
paper: https://arxiv.org/pdf/2501.03714.pdf
code: null
video: https://youtu.be/5L6gzc5-cw8?si=L6v6XLZFQrYK50iV
tags:
- Compression
- Dynamic
- Project
- Video
thumbnail: assets/thumbnails/kwak2025modecgs.jpg
publication_date: '2025-01-07T11:43:13+00:00'
date_source: arxiv
- id: yu2025dehazegs
title: 'DehazeGS: Seeing Through Fog with 3D Gaussian Splatting'
authors: Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng
Zhang
year: '2025'
abstract: 'Current novel view synthesis tasks primarily rely on high-quality and
clear images. However, in foggy scenes, scattering and attenuation can significantly
degrade the reconstruction and rendering quality. Although NeRF-based dehazing
reconstruction algorithms have been developed, their use of deep fully connected
neural networks and per-ray sampling strategies leads to high computational costs.
Moreover, NeRF''s implicit representation struggles to recover fine details from
hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve
high-quality 3D scene reconstruction by explicitly modeling point clouds into
3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation
to explain the foggy image formation process through a physically accurate forward
rendering process. We introduce DehazeGS, a method capable of decomposing and
rendering a fog-free background from participating media using only muti-view
foggy images as input. We model the transmission within each Gaussian distribution
to simulate the formation of fog. During this process, we jointly learn the atmospheric
light and scattering coefficient while optimizing the Gaussian representation
of the hazy scene. In the inference stage, we eliminate the effects of scattering
and attenuation on the Gaussians and directly project them onto a 2D plane to
obtain a clear view. Experiments on both synthetic and real-world foggy datasets
demonstrate that DehazeGS achieves state-of-the-art performance in terms of both
rendering quality and computational efficiency.
'
project_page: null
paper: https://arxiv.org/pdf/2501.03659.pdf
code: null
video: null
tags:
- In the Wild
- Rendering
thumbnail: assets/thumbnails/yu2025dehazegs.jpg
publication_date: '2025-01-07T09:47:46+00:00'
date_source: arxiv
- id: lee2025compression
title: Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard
Video Codecs
authors: Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge
year: '2025'
abstract: '3D Gaussian Splatting is a recognized method for 3D scene representation,
known for its high rendering quality and speed. However, its substantial data
requirements present challenges for practical applications. In this paper, we
introduce an efficient compression technique that significantly reduces storage
overhead by using compact representation. We propose a unified architecture that
combines point cloud data and feature planes through a progressive tri-plane structure.
Our method utilizes 2D feature planes, enabling continuous spatial representation.
To further optimize these representations, we incorporate entropy modeling in
the frequency domain, specifically designed for standard video codecs. We also
propose channel-wise bit allocation to achieve a better trade-off between bitrate
consumption and feature plane representation. Consequently, our model effectively
leverages spatial correlations within the feature planes to enhance rate-distortion
performance using standard, non-differentiable video codecs. Experimental results
demonstrate that our method outperforms existing methods in data compactness while
maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS
'
project_page: null
paper: https://arxiv.org/pdf/2501.03399.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/lee2025compression.jpg
publication_date: '2025-01-06T21:37:30+00:00'
date_source: arxiv
- id: rajasegaran2025gaussian
title: Gaussian Masked Autoencoders
authors: Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer,
Jitendra Malik, Shiry Ginosar
year: '2025'
abstract: 'This paper explores Masked Autoencoders (MAE) with Gaussian Splatting.
While reconstructive self-supervised learning frameworks such as MAE learns good
semantic abstractions, it is not trained for explicit spatial awareness. Our approach,
named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions
and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end
in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based
representation and renders images via splatting. We show that GMAE can enable
various zero-shot learning capabilities of spatial understanding (e.g., figure-ground
segmentation, image layering, edge detection, etc.) while preserving the high-level
semantics of self-supervised representation quality from MAE. To our knowledge,
we are the first to employ Gaussian primitives in an image representation learning
framework beyond optimization-based single-scene reconstructions. We believe GMAE
will inspire further research in this direction and contribute to developing next-generation
techniques for modeling high-fidelity visual data. More details at https://brjathu.github.io/gmae
'
project_page: null
paper: https://arxiv.org/pdf/2501.03229.pdf
code: null
video: null
tags:
- Transformer
thumbnail: assets/thumbnails/rajasegaran2025gaussian.jpg
publication_date: '2025-01-06T18:59:57+00:00'
date_source: arxiv
- id: nguyen2025pointmapconditioned
title: Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
authors: Thang-Anh-Quan Nguyen, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry
Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland Brémond
year: '2025'
abstract: 'In this paper, we present PointmapDiffusion, a novel framework for single-image
novel view synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our
method is the first to leverage pointmaps (i.e. rasterized 3D scene coordinates)
as a conditioning signal, capturing geometric prior from the reference images
to guide the diffusion process. By embedding reference attention blocks and a
ControlNet for pointmap features, our model balances between generative capability
and geometric consistency, enabling accurate view synthesis across varying viewpoints.
Extensive experiments on diverse real-world datasets demonstrate that PointmapDiffusion
achieves high-quality, multi-view consistent results with significantly fewer
trainable parameters compared to other baselines for single-image NVS tasks.
'
project_page: null
paper: https://arxiv.org/pdf/2501.02913.pdf
code: null
video: null
tags:
- Diffusion
thumbnail: assets/thumbnails/nguyen2025pointmapconditioned.jpg
publication_date: '2025-01-06T10:48:31+00:00'
date_source: arxiv
- id: bian2025gsdit
title: 'GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through
Efficient Dense 3D Point Tracking'
authors: Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng
Li
year: '2025'
abstract: '4D video control is essential in video generation as it enables the use
of sophisticated lens techniques, such as multi-camera shooting and dolly zoom,
which are currently unsupported by existing methods. Training a video Diffusion
Transformer (DiT) directly to control 4D content requires expensive multi-view
videos. Inspired by Monocular Dynamic novel View Synthesis (MDVS) that optimizes
a 4D representation and renders videos according to different 4D elements, such
as camera pose and object motion editing, we bring pseudo 4D Gaussian fields to
video generation. Specifically, we propose a novel framework that constructs a
pseudo 4D Gaussian field with dense 3D point tracking and renders the Gaussian
field for all video frames. Then we finetune a pretrained DiT to generate videos
following the guidance of the rendered video, dubbed as GS-DiT. To boost the training
of the GS-DiT, we also propose an efficient Dense 3D Point Tracking (D3D-PT) method
for the pseudo 4D Gaussian field construction. Our D3D-PT outperforms SpatialTracker,
the state-of-the-art sparse 3D point tracking method, in accuracy and accelerates
the inference speed by two orders of magnitude. During the inference stage, GS-DiT
can generate videos with the same dynamic content while adhering to different
camera parameters, addressing a significant limitation of current video generation
models. GS-DiT demonstrates strong generalization capabilities and extends the
4D controllability of Gaussian splatting to video generation beyond just camera
poses. It supports advanced cinematic effects through the manipulation of the
Gaussian field and camera intrinsics, making it a powerful tool for creative video
production. Demos are available at https://wkbian.github.io/Projects/GS-DiT/.
'
project_page: null
paper: https://arxiv.org/pdf/2501.02690.pdf
code: null
video: null
tags:
- Year 2025
thumbnail: assets/thumbnails/bian2025gsdit.jpg
publication_date: '2025-01-05T23:55:33+00:00'
date_source: arxiv
- id: cong2025videolifter
title: 'VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment'
authors: Wenyan Cong, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin
Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang,
Zhiwen Fan
year: '2025'
abstract: 'Efficiently reconstructing accurate 3D models from monocular video is
a key challenge in computer vision, critical for advancing applications in virtual
reality, robotics, and scene understanding. Existing approaches typically require
pre-computed camera parameters and frame-by-frame reconstruction pipelines, which
are prone to error accumulation and entail significant computational overhead.
To address these limitations, we introduce VideoLifter, a novel framework that
leverages geometric priors from a learnable model to incrementally optimize a
globally sparse to dense 3D representation directly from video sequences. VideoLifter
segments the video sequence into local windows, where it matches and registers
frames, constructs consistent fragments, and aligns them hierarchically to produce
a unified 3D model. By tracking and propagating sparse point correspondences across
frames and fragments, VideoLifter incrementally refines camera poses and 3D structure,
minimizing reprojection error for improved accuracy and robustness. This approach
significantly accelerates the reconstruction process, reducing training time by
over 82% while surpassing current state-of-the-art methods in visual fidelity
and computational efficiency.
'
project_page: https://videolifter.github.io/
paper: https://arxiv.org/pdf/2501.01949.pdf
code: null
video: null
tags:
- Acceleration
- Diffusion
- Project
thumbnail: assets/thumbnails/cong2025videolifter.jpg
publication_date: '2025-01-03T18:52:36+00:00'
date_source: arxiv
- id: huang2025enerverse
title: 'EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation'
authors: Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang,
Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren
year: '2025'
abstract: 'We introduce EnerVerse, a comprehensive framework for embodied future
space generation specifically designed for robotic manipulation tasks. EnerVerse
seamlessly integrates convolutional and bidirectional attention mechanisms for
inner-chunk space modeling, ensuring low-level consistency and continuity. Recognizing
the inherent redundancy in video data, we propose a sparse memory context combined
with a chunkwise unidirectional generative paradigm to enable the generation of
infinitely long sequences. To further augment robotic capabilities, we introduce
the Free Anchor View (FAV) space, which provides flexible perspectives to enhance
observation and analysis. The FAV space mitigates motion modeling ambiguity, removes
physical constraints in confined environments, and significantly improves the
robot''s generalization and adaptability across various tasks and settings. To
address the prohibitive costs and labor intensity of acquiring multi-camera observations,
we present a data engine pipeline that integrates a generative model with 4D Gaussian
Splatting (4DGS). This pipeline leverages the generative model''s robust generalization
capabilities and the spatial constraints provided by 4DGS, enabling an iterative
enhancement of data quality and diversity, thus creating a data flywheel effect
that effectively narrows the sim-to-real gap. Finally, our experiments demonstrate
that the embodied future space generation prior substantially enhances policy
predictive capabilities, resulting in improved overall performance, particularly
in long-range robotic manipulation tasks.
'
project_page: https://sites.google.com/view/enerverse
paper: https://arxiv.org/pdf/2501.01895.pdf
code: null
video: null
tags:
- Dynamic
- Project
- Robotics
thumbnail: assets/thumbnails/huang2025enerverse.jpg
publication_date: '2025-01-03T17:00:33+00:00'
- id: longhini2024clothsplatting
title: 'Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision'
authors: Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell,
Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
year: '2024'
abstract: Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field
reconstruction, manifesting efficient and high-fidelity novel view synthesis.
However, accurately We introduce Cloth-Splatting, a method for estimating 3D states
of cloth from RGB images through a prediction-update framework. Cloth-Splatting
leverages an action-conditioned dynamics model for predicting future states and
uses 3D Gaussian Splatting to update the predicted states. Our key insight is
that coupling a 3D mesh-based representation with Gaussian Splatting allows us
to define a differentiable map between the cloth's state space and the image space.
This enables the use of gradient-based optimization techniques to refine inaccurate
state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting
not only improves state estimation accuracy over current baselines but also reduces
convergence time by ~85%.
project_page: https://kth-rpl.github.io/cloth-splatting/
paper: https://arxiv.org/pdf/2501.01715.pdf
code: https://github.com/KTH-RPL/cloth-splatting
video: null
tags:
- Code
- Meshing
- Project
- Rendering
thumbnail: assets/thumbnails/longhini2024clothsplatting.jpg
publication_date: '2025-01-03T09:17:30+00:00'
date_source: arxiv
- id: zhang2025crossviewgs
title: 'CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction'
authors: Chenhao Zhang, Yuanping Cao, Lei Zhang
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent method for scene
representation and reconstruction, leveraging densely distributed Gaussian primitives
to enable real-time rendering of high-resolution images. While existing 3DGS methods
perform well in scenes with minor view variation, large view changes in cross-view
scenes pose optimization challenges for these methods. To address these issues,
we propose a novel cross-view Gaussian Splatting method for large-scale scene
reconstruction, based on dual-branch fusion. Our method independently reconstructs
models from aerial and ground views as two independent branches to establish the
baselines of Gaussian distribution, providing reliable priors for cross-view reconstruction
during both initialization and densification. Specifically, a gradient-aware regularization
strategy is introduced to mitigate smoothing issues caused by significant view
disparities. Additionally, a unique Gaussian supplementation strategy is utilized
to incorporate complementary information of dual-branch into the cross-view model.
Extensive experiments on benchmark datasets demonstrate that our method achieves
superior performance in novel view synthesis compared to state-of-the-art methods.
project_page: null
paper: https://arxiv.org/pdf/2501.01695.pdf
code: null
video: null
tags:
- Large-Scale
- Optimization
thumbnail: assets/thumbnails/zhang2025crossviewgs.jpg
publication_date: '2025-01-03T08:24:59+00:00'
- id: wang2025pgsag
title: 'PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings
Reconstruction via Semantic-Aware Grouping'
authors: Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) has emerged as a transformative method in
the field of real-time novel synthesis. Based on 3DGS, recent advancements cope
with large-scale scenes via spatial-based partition strategy to reduce video memory
and optimization time costs. In this work, we introduce a parallel Gaussian splatting
method, termed PG-SAG, which fully exploits semantic cues for both partitioning
and Gaussian kernel optimization, enabling fine-grained building surface reconstruction
of large-scale urban areas without downsampling the original image resolution.
First, the Cross-modal model - Language Segment Anything is leveraged to segment
building masks. Then, the segmented building regions is grouped into sub-regions
according to the visibility check across registered images. The Gaussian kernels
for these sub-regions are optimized in parallel with masked pixels. In addition,
the normal loss is re-formulated for the detected edges of masks to alleviate
the ambiguities in normal vectors on edges. Finally, to improve the optimization
of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts
for the complexity of the corresponding scenes, effectively minimizing the thread
waiting time in the pixel-parallel rendering stage as well as the reconstruction
lost. Extensive experiments are tested on various urban datasets, the results
demonstrated the superior performance of our PG-SAG on building surface reconstruction,
compared to several state-of-the-art 3DGS-based methods.
project_page: null
paper: https://arxiv.org/pdf/2501.01677.pdf
code: null
video: null
tags:
- Large-Scale
- Meshing
- Optimization
thumbnail: assets/thumbnails/wang2025pgsag.jpg
publication_date: '2025-01-03T07:40:16+00:00'
- id: gao2025easysplat
title: 'EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy'
authors: Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) techniques have achieved satisfactory 3D
scene representation. Despite their impressive performance, they confront challenges
due to the limitation of structure-from-motion (SfM) methods on acquiring accurate
scene initialization, or the inefficiency of densification strategy. In this paper,
we introduce a novel framework EasySplat to achieve high-quality 3DGS modeling.
Instead of using SfM for scene initialization, we employ a novel method to release
the power of large-scale pointmap approaches. Specifically, we propose an efficient
grouping strategy based on view similarity, and use robust pointmap priors to
obtain high-quality point clouds and camera poses for 3D scene initialization.
After obtaining a reliable scene structure, we propose a novel densification approach
that adaptively splits Gaussian primitives based on the average shape of neighboring
Gaussian ellipsoids, utilizing KNN scheme. In this way, the proposed method tackles
the limitation on initialization and optimization, leading to an efficient and
accurate 3DGS modeling. Extensive experiments demonstrate that EasySplat outperforms
the current state-of-the-art (SOTA) in handling novel view synthesis.
project_page: null
paper: https://arxiv.org/pdf/2501.01003.pdf
code: null
video: null
tags:
- 3ster-based
- Acceleration
- Densification
- Rendering
thumbnail: assets/thumbnails/gao2025easysplat.jpg
publication_date: '2025-01-02T01:56:58+00:00'
- id: yang2024storm
title: 'STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes'
authors: Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You,
Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Boris Ivanovic, Yue Wang,
Marco Pavone
year: '2024'
abstract: We present STORM, a spatio-temporal reconstruction model designed for
reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic
reconstruction methods often rely on per-scene optimization, dense observations
across space and time, and strong motion supervision, resulting in lengthy optimization
times, limited generalization to novel views or scenes, and degenerated quality
caused by noisy pseudo-labels for dynamics. To address these challenges, STORM
leverages a data-driven Transformer architecture that directly infers dynamic
3D scene representations--parameterized by 3D Gaussians and their velocities--in
a single forward pass. Our key design is to aggregate 3D Gaussians from all frames
using self-supervised scene flows, transforming them to the target timestep to
enable complete (i.e., "amodal") reconstructions from arbitrary viewpoints at
any moment in time. As an emergent property, STORM automatically captures dynamic
instances and generates high-quality masks using only reconstruction losses. Extensive
experiments on public datasets show that STORM achieves precise dynamic scene
reconstruction, surpassing state-of-the-art per-scene optimization methods (+4.3
to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic
regions. STORM reconstructs large-scale outdoor scenes in 200ms, supports real-time
rendering, and outperforms competitors in scene flow estimation, improving 3D
EPE by 0.422m and Acc5 by 28.02%. Beyond reconstruction, we showcase four additional
applications of our model, illustrating the potential of self-supervised learning
for broader dynamic scene understanding.
project_page: null
paper: https://arxiv.org/pdf/2501.00602.pdf
code: null
video: https://jiawei-yang.github.io/STORM/
tags:
- Autonomous Driving
- Dynamic
- Large-Scale
- Video
thumbnail: assets/thumbnails/yang2024storm.jpg
publication_date: '2024-12-31T18:59:58+00:00'
- id: mao2024dreamdrive
title: 'DreamDrive: Generative 4D Scene Modeling from Street View Images'
authors: Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You,
Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang
year: '2024'
abstract: Synthesizing photo-realistic visual observations from an ego vehicle's
driving trajectory is a critical step towards scalable training of self-driving
models. Reconstruction-based methods create 3D scenes from driving logs and synthesize
geometry-consistent driving videos through neural rendering, but their dependence
on costly object annotations limits their ability to generalize to in-the-wild
driving scenarios. On the other hand, generative models can synthesize action-conditioned
driving videos in a more generalizable way but often struggle with maintaining
3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal
scene generation approach that combines the merits of generation and reconstruction,
to synthesize generalizable 4D driving scenes and dynamic driving videos with
3D consistency. Specifically, we leverage the generative power of video diffusion
models to synthesize a sequence of visual references and further elevate them
to 4D with a novel hybrid Gaussian representation. Given a driving trajectory,
we then render 3D-consistent driving videos via Gaussian splatting. The use of
generative priors allows our method to produce high-quality 4D scenes from in-the-wild
driving data, while neural rendering ensures 3D-consistent video generation from
the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate
that DreamDrive can generate controllable and generalizable 4D driving scenes,
synthesize novel views of driving videos with high fidelity and 3D consistency,
decompose static and dynamic elements in a self-supervised manner, and enhance
perception and planning tasks for autonomous driving.
project_page: null
paper: https://arxiv.org/pdf/2501.00601.pdf
code: null
video: null
tags:
- Autonomous Driving
- Dynamic
- Feed-Forward
thumbnail: assets/thumbnails/mao2024dreamdrive.jpg
publication_date: '2024-12-31T18:59:57+00:00'
- id: wang2024sgsplatting
title: 'SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians'
authors: Yiwen Wang, Siyuan Chen, Ran Yi
year: '2024'
abstract: '3D Gaussian Splatting is emerging as a state-of-the-art technique in
novel view synthesis, recognized for its impressive balance between visual quality,
speed, and rendering efficiency. However, reliance on third-degree spherical harmonics
for color representation introduces significant storage demands and computational
overhead, resulting in a large memory footprint and slower rendering speed. We
introduce SG-Splatting with Spherical Gaussians based color representation, a
novel approach to enhance rendering speed and quality in novel view synthesis.
Our method first represents view-dependent color using Spherical Gaussians, instead
of three degree spherical harmonics, which largely reduces the number of parameters
used for color representation, and significantly accelerates the rendering process.
We then develop an efficient strategy for organizing multiple Spherical Gaussians,
optimizing their arrangement to achieve a balanced and accurate scene representation.
To further improve rendering quality, we propose a mixed representation that combines
Spherical Gaussians with low-degree spherical harmonics, capturing both high-
and low-frequency color information effectively. SG-Splatting also has plug-and-play
capability, allowing it to be easily integrated into existing systems. This approach
improves computational efficiency and overall visual fidelity, making it a practical
solution for real-time applications.
'
project_page: null
paper: https://arxiv.org/pdf/2501.00342.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/wang2024sgsplatting.jpg
publication_date: '2024-12-31T08:31:52+00:00'
- id: cha2024perse
title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait'
authors: Hyunsoo Cha, Inhee Lee, Hanbyul Joo
year: '2024'
abstract: We present PERSE, a method for building an animatable personalized generative
avatar from a reference portrait. Our avatar model enables facial attribute editing
in a continuous and disentangled latent space to control each facial attribute,
while preserving the individual's identity. To achieve this, our method begins
by synthesizing large-scale synthetic 2D video datasets, where each video contains
consistent changes in the facial expression and viewpoint, combined with a variation
in a specific facial attribute from the original input. We propose a novel pipeline
to produce high-quality, photorealistic 2D videos with facial attribute editing.
Leveraging this synthetic attribute dataset, we present a personalized avatar
creation method based on the 3D Gaussian Splatting, learning a continuous and
disentangled latent space for intuitive facial attribute manipulation. To enforce
smooth transitions in this latent space, we introduce a latent space regularization
technique by using interpolated 2D faces as supervision. Compared to previous
approaches, we demonstrate that PERSE generates high-quality avatars with interpolated
attributes while preserving identity of reference person.
project_page: https://hyunsoocha.github.io/perse/
paper: https://arxiv.org/pdf/2412.21206v1.pdf
code: null
video: https://youtu.be/zX881Zx03o4
tags:
- Avatar
- GAN
- Project
- Video
thumbnail: assets/thumbnails/cha2024perse.jpg
publication_date: '2024-12-30T18:59:58+00:00'
- id: yang20244d
title: '4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives'
authors: Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang, Yu-Gang Jiang, Philip H. S.
Torr
year: '2024'
abstract: Dynamic 3D scene representation and novel view synthesis from captured
videos are crucial for enabling immersive experiences required by AR/VR and metaverse
applications. However, this task is challenging due to the complexity of unconstrained
real-world scenes and their temporal dynamics. In this paper, we frame dynamic
scenes as a spatio-temporal 4D volume learning problem, offering a native explicit
reformulation with minimal assumptions about motion, which serves as a versatile
dynamic scene learning framework. Specifically, we represent a target dynamic
scene using a collection of 4D Gaussian primitives with explicit geometry and
appearance features, dubbed as 4D Gaussian splatting (4DGS). This approach can
capture relevant information in space and time by fitting the underlying spatio-temporal
volume. Modeling the spacetime as a whole with 4D Gaussians parameterized by anisotropic
ellipses that can rotate arbitrarily in space and time, our model can naturally
learn view-dependent and time-evolved appearance with 4D spherindrical harmonics.
Notably, our 4DGS model is the first solution that supports real-time rendering
of high-resolution, photorealistic novel views for complex dynamic scenes. To
enhance efficiency, we derive several compact variants that effectively reduce
memory footprint and mitigate the risk of overfitting. Extensive experiments validate
the superiority of 4DGS in terms of visual quality and efficiency across a range
of dynamic scene-related tasks (e.g., novel view synthesis, 4D generation, scene
understanding) and scenarios (e.g., single object, indoor scenes, driving environments,
synthetic and real data).
project_page: null
paper: https://arxiv.org/pdf/2412.20720v1.pdf
code: null
video: null
tags:
- Compression
- Dynamic
- Large-Scale
thumbnail: assets/thumbnails/yang20244d.jpg
publication_date: '2024-12-30T05:30:26+00:00'
- id: liu2024maskgaussian
title: 'MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks'
authors: Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun
year: '2024'
abstract: 'While 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance
in novel view synthesis and real-time rendering, the high memory consumption due
to the use of millions of Gaussians limits its practicality. To mitigate this
issue, improvements have been made by pruning unnecessary Gaussians, either through
a hand-crafted criterion or by using learned masks. However, these methods deterministically
remove Gaussians based on a snapshot of the pruning moment, leading to sub-optimized
reconstruction performance from a long-term perspective. To address this issue,
we introduce MaskGaussian, which models Gaussians as probabilistic entities rather
than permanently removing them, and utilize them according to their probability
of existence. To achieve this, we propose a masked-rasterization technique that
enables unused yet probabilistically existing Gaussians to receive gradients,
allowing for dynamic assessment of their contribution to the evolving scene and
adjustment of their probability of existence. Hence, the importance of Gaussians
iteratively changes and the pruned Gaussians are selected diversely. Extensive
experiments demonstrate the superiority of the proposed method in achieving better
rendering quality with fewer Gaussians than previous pruning methods, pruning
over 60% of Gaussians on average with only a 0.02 PSNR decline. Our code can be
found at: https://github.com/kaikai23/MaskGaussian
'
project_page: null
paper: https://arxiv.org/pdf/2412.20522.pdf
code: https://github.com/kaikai23/MaskGaussian
video: null
tags:
- Code
- Compression
- Densification
thumbnail: assets/thumbnails/liu2024maskgaussian.jpg
publication_date: '2024-12-29T17:12:16+00:00'
date_source: arxiv
- id: xu2024das3r
title: 'DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction'
authors: Kai Xu, Tze Ho Elden Tse, Jizong Peng, Angela Yao
year: '2024'
abstract: 'We propose a novel framework for scene decomposition and static background
reconstruction from everyday videos. By integrating the trained motion masks and
modeling the static scene as Gaussian splats with dynamics-aware optimization,
our method achieves more accurate background reconstruction results than previous
works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware
Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods,
DAS3R is more robust in complex motion scenarios, capable of handling videos where
dynamic objects occupy a significant portion of the scene, and does not require
camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R
against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R
demonstrates enhanced performance and robustness with a margin of more than 2
dB in PSNR. The project''s webpage can be accessed via \url{https://kai422.github.io/DAS3R/}
'
project_page: https://kai422.github.io/DAS3R/
paper: https://arxiv.org/pdf/2412.19584.pdf
code: https://github.com/kai422/das3r
video: https://kai422.github.io/DAS3R/assets/davis.gif
tags:
- Code
- Project
- Video
thumbnail: assets/thumbnails/xu2024das3r.jpg
publication_date: '2024-12-27T10:59:46+00:00'
date_source: arxiv
- id: cai2024dust
title: 'Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from
Sparse Uncalibrated Images'
authors: Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting
Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu
year: '2024'
abstract: Photo-realistic scene reconstruction from sparse-view, uncalibrated images
is highly required in practice. Although some successes have been made, existing
methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic
and extrinsic), or SfM-free but need densely captured images. To combine the advantages
of both methods while addressing their respective weaknesses, we propose Dust
to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize
3DGS and image poses simultaneously from sparse and uncalibrated images. Our key
idea is to first construct a coarse model efficiently and subsequently refine
it using warped and inpainted images at novel viewpoints. To do this, we first
introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View
Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial
camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence
Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning
their confident parts with estimated depths by a Mono-depth model. Then, a Warped
Image-Guided Inpainting (WIGI) module is proposed to warp the training images
to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill
the ``holes" in the warped images caused by view-direction changes, providing
high-quality supervision to further optimize the 3D model and the camera poses.
Extensive experiments and ablation studies demonstrate the validity of D2T and
its design choices, achieving state-of-the-art performance in both tasks of novel
view synthesis and pose estimation while keeping high efficiency. Codes will be
publicly available.
project_page: null
paper: https://arxiv.org/pdf/2412.19518.pdf
code: null
video: null
tags:
- Inpainting
- Poses
- Sparse
thumbnail: assets/thumbnails/cai2024dust.jpg
publication_date: '2024-12-27T08:19:34+00:00'
- id: yao2024reflective
title: Reflective Gaussian Splatting
authors: Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang
year: '2024'
abstract: 'Novel view synthesis has experienced significant advancements owing to
increasingly capable NeRF- and 3DGS-based methods. However, reflective object
reconstruction remains challenging, lacking a proper solution to achieve real-time,
high-quality rendering while accommodating inter-reflection. To fill this gap,
we introduce a Reflective Gaussian splatting (\textbf{Ref-Gaussian}) framework
characterized with two components: (I) {\em Physically based deferred rendering}
that empowers the rendering equation with pixel-level material properties via
formulating split-sum approximation; (II) {\em Gaussian-grounded inter-reflection}
that realizes the desired inter-reflection function within a Gaussian splatting
paradigm for the first time. To enhance geometry modeling, we further introduce
material-aware normal propagation and an initial per-Gaussian shading stage, along
with 2D Gaussian primitives. Extensive experiments on standard datasets demonstrate
that Ref-Gaussian surpasses existing approaches in terms of quantitative metrics,
visual quality, and compute efficiency. Further, we show that our method serves
as a unified solution for both reflective and non-reflective scenes, going beyond
the previous alternatives focusing on only reflective scenes. Also, we illustrate
that Ref-Gaussian supports more applications such as relighting and editing.
'
project_page: https://fudan-zvg.github.io/ref-gaussian/
paper: https://arxiv.org/pdf/2412.19282.pdf
code: null
video: null
tags:
- Meshing
- Project
- Ray Tracing
- Relight
thumbnail: assets/thumbnails/yao2024reflective.jpg
publication_date: '2024-12-26T16:58:35+00:00'
- id: qian2024weathergs
title: 'WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian
Splatting'
authors: Chenghao Qian, Yuhu Guo, Wenjing Li, Gustav Markkula
year: '2024'
abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene
reconstruction, but still suffers from complex outdoor environments, especially
under adverse weather. This is because 3DGS treats the artifacts caused by adverse
weather as part of the scene and will directly reconstruct them, largely reducing
the clarity of the reconstructed scene. To address this challenge, we propose
WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view
images under different weather conditions. Specifically, we explicitly categorize
the multi-weather artifacts into the dense particles and lens occlusions that
have very different characters, in which the former are caused by snowflakes and
raindrops in the air, and the latter are raised by the precipitation on the camera
lens. In light of this, we propose a dense-to-sparse preprocess strategy, which
sequentially removes the dense particles by an Atmospheric Effect Filter (AEF)
and then extracts the relatively sparse occlusion masks with a Lens Effect Detector
(LED). Finally, we train a set of 3D Gaussians by the processed images and generated
masks for excluding occluded areas, and accurately recover the underlying clear
scene by Gaussian splatting. We conduct a diverse and challenging benchmark to
facilitate the evaluation of 3D reconstruction under complex weather scenarios.
Extensive experiments on this benchmark demonstrate that our WeatherGS consistently
produces high-quality, clean scenes across various weather scenarios, outperforming
existing state-of-the-art methods.
project_page: null
paper: https://arxiv.org/pdf/2412.18862.pdf
code: https://github.com/Jumponthemoon/WeatherGS
video: null
tags:
- Code
- In the Wild
thumbnail: assets/thumbnails/qian2024weathergs.jpg
publication_date: '2024-12-25T10:16:57+00:00'
- id: lyu2024facelift
title: 'FaceLift: Single Image to 3D Head with View Generation and GS-LRM'
authors: Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu
year: '2024'
abstract: We present FaceLift, a feed-forward approach for rapid, high-quality,
360-degree head reconstruction from a single image. Our pipeline begins by employing
a multi-view latent diffusion model that generates consistent side and back views
of the head from a single facial input. These generated views then serve as input
to a GS-LRM reconstructor, which produces a comprehensive 3D representation using
Gaussian splats. To train our system, we develop a dataset of multi-view renderings
using synthetic 3D human head as-sets. The diffusion-based multi-view generator
is trained exclusively on synthetic head images, while the GS-LRM reconstructor
undergoes initial training on Objaverse followed by fine-tuning on synthetic head
data. FaceLift excels at preserving identity and maintaining view consistency
across views. Despite being trained solely on synthetic data, FaceLift demonstrates
remarkable generalization to real-world images. Through extensive qualitative
and quantitative evaluations, we show that FaceLift outperforms state-of-the-art
methods in 3D head reconstruction, highlighting its practical applicability and
robust performance on real-world images. In addition to single image reconstruction,
FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates
with 2D reanimation techniques to enable 3D facial animation.
project_page: https://www.wlyu.me/FaceLift/
paper: https://arxiv.org/pdf/2412.17812.pdf
code: null
video: https://huggingface.co/wlyu/FaceLift/resolve/main/videos/website_video.mp4
tags:
- Avatar
- Feed-Forward
- Project
- Video
thumbnail: assets/thumbnails/lyu2024facelift.jpg
publication_date: '2024-12-23T18:59:49+00:00'
- id: shao2024gausim
title: 'GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator'
authors: Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai
year: '2024'
abstract: In this work, we introduce GauSim, a novel neural network-based simulator
designed to capture the dynamic behaviors of real-world elastic objects represented
through Gaussian kernels. Unlike traditional methods that treat kernels as particles
within particle-based simulations, we leverage continuum mechanics, modeling each
kernel as a continuous piece of matter to account for realistic deformations without
idealized assumptions. To improve computational efficiency and fidelity, we employ
a hierarchical structure that organizes kernels into Center of Mass Systems (CMS)
with explicit formulations, enabling a coarse-to-fine simulation approach. This
structure significantly reduces computational overhead while preserving detailed
dynamics. In addition, GauSim incorporates explicit physics constraints, such
as mass and momentum conservation, ensuring interpretable results and robust,
physically plausible simulations. To validate our approach, we present a new dataset,
READY, containing multi-view videos of real-world elastic deformations. Experimental
results demonstrate that GauSim achieves superior performance compared to existing
physics-driven baselines, offering a practical and accurate solution for simulating
complex dynamic behaviors. Code and model will be released.
project_page: https://www.mmlab-ntu.com/project/gausim/index.html
paper: https://arxiv.org/pdf/2412.17804.pdf
code: null
video: null
tags:
- Dynamic
- Physics
- Project
thumbnail: assets/thumbnails/shao2024gausim.jpg
publication_date: '2024-12-23T18:58:17+00:00'
- id: jin2024activegs
title: 'ActiveGS: Active Scene Reconstruction using Gaussian Splatting'
authors: Liren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, Marija
Popović
year: '2024'
abstract: 'Robotics applications often rely on scene reconstructions to enable downstream