Awesome Text-to-Video Generation

This repository contains a curated list of text-to-video generation papers and BibTeX entries (until Dec. 2023).

Paper summary
Zero-shot leaderboard
Dataset summary

Paper summary

Name	Date	Affiliation	Train set	Test set	Other expr
GODIVA	21.04	Microsoft	HowTo100M	MSR-VTT	user study
NUWA website	ECCV22	Microsoft	241k VATEX	Kinetics, MSR-VTT	sketch2video, edit
Video Diffusion website	NIPS22	Google	10M	-	unconditional, longer
Imagen Video website	22.10	Google	14M	-	-
MagicVideo website	22.11	ByteDance	WebVid-10M (+ 10M from HD-VILA-100M + 7M)	UCF-101, MSR-VTT	user study
LVDM website code	22.11	HKUST	2M from WebVid-10M	UCF-101, Sky Time-lapse, Taichi	unconditional, long
Make-A-Video website	ICLR23	Meta	WebVid-10M + 10M from HD-VILA-100M	UCF-101, MSR-VTT	user study
Phenaki website	ICLR23	Google	~15M	Kinetics-400	img conditioned
CogVideo demo website code	ICLR23	THU	5.4M	UCF-101, Kinetics-600	user study
Video LDM website	CVPR23 23.04	NVIDIA	WebVid-10M (+ 683k driving)	UCF-101, MSR-VTT	personalized
Gen1 demo website	ICCV23	Runway	6.4M	-	user study, edit, iv2v, customization
PYoCo website	ICCV23 23.05	NVIDIA	22.5M	UCF-101, MSR-VTT	unconditional
VideoComposer website code	NIPS23	Alibaba	WebVid-10M	MSR-VTT	compositional i2v, sketch, motion control
GLOBER website code	NIPS23	CASIA	WebVid-10M or less	UCF-101, Sky Time-lapse, Taichi, WebVid-10M	unconditional
VideoFusion	23.03	CASIA	WebVid-10M or less	UCF-101, Sky Time-lapse, Taichi, WebVid-10M	unconditional, long
Latent-Shift website	23.04	Meta	WebVid-10M	UCF-101, MSR-VTT	user study
VideoFactory	23.05	PKU	HD-VG-130M + WebVid-10M	UCF-101, MSR-VTT, WebVid-10M	user study, personalized
Make-Your-Video website code	23.06	CUHK	WebVid-10M	UCF-101	depth, re-rendering, user study
Animate-A-Story website	23.07	HKUST	WebVid-10M	UCF-101	storytelling, personalized
InternVid	ICLR24 23.07	Shanghai	WebVid10M + InternVid18M	UCF-101, MSR-VTT	dialogue
ModelScopeT2V demo website	23.08	Alibaba	WebVid-10M	MSR-VTT	-
Dysen-VDM website	23.08	NUS	WebVid-10M	UCF-101, MSR-VTT	user study
VidRD website code	23.09	Huawei	WebVid-2M, TGIF, VATEX, Pexels (5.3M)	UCF-101	-
LaVie demo demo2 website code	23.09	Shanghai	WebVid-10M + Vimeo25M	UCF-101, MSR-VTT	user study, long, personalized
Show-1 demo demo2 website code	23.09	NUS	WebVid-10M	UCF-101, MSR-VTT	user study
VideoCrafter demo demo2 website code	23.10	Tencent	WebVid-10M + 10M	-	user study, img conditioned, i2v
Emu Video website	23.11	Meta	34M	UCF-101	user study, longer
SVD demo website1 website2 code	23.11	Stability	LVD (580M) / LVD-F (152M)	UCF-101	i2v, user study, camera motion, multi-view
PixelDance website	23.11	ByteDance	WebVid-10M + 500k watermark-free	UCF-101, MSR-VTT	long, sketch instruction, edit
W.A.L.T website	23.12	Google	89M	UCF-101, Kinetics-600	class-conditional, i2v
VideoPoet website	23.12	Google	~270M (100M paired)	UCF-101, MSR-VTT	user study, stylization, edit, i2v, long, camera motion

Bold dataset indicates zero-shot evaluation.

Models without a technical report such as Gen-2, Pika 1.0, zeroscope are not included.

Bold expr for quantitative

VideoComposer (NeurIPS23), PixelDance: 4fps 16 frames; VideoPoet: 8fps 17 frames; EMU Video: input 4/8fps 8 frames, output 16fps 37 frames

Zero-shot leaderboard

Name	Date	Data	MSR-VTT CLIPSIM	MSR-VTT FID	MSR-VTT FVD	UCF-101 FID	UCF-101 FVD	UCF-101 IS
CogVideo	ICLR23	~~5.4M~~	0.2631	23.59	1294	179.00	701.59	25.27
MagicVideo	22.11	10M			998	145.00	655.00
LVDM	22.11	2M	0.2381		742		641.80
VideoFusion	23.03	10M	0.2795			75.77	639.90	17.49
Latent-Shift	23.04	10M	0.2773	15.23
VideoCrafter	23.10	~~20M~~	0.2875			66.95	910.87	18.26
Video LDM	CVPR23 23.04	10M	0.2929				550.61	33.45
VideoComposer	NIPS23	10M	0.2932		580
InternVid	ICLR24 23.07	~~28M~~	0.2951			60.25	616.51	21.04
Animate-A-Story	23.07	10M					516.15
ModelScopeT2V	23.08	10M	0.2930	11.09	550
LaVie	23.09	~~35M~~	0.2949				526.30
Emu Video	23.11	~~34M~~					606.20	42.70
Make-A-Video	ICLR23	20M	0.3049	13.17			367.23	33.00
VideoFactory	23.05	~~140M~~	0.3005				410.00
Show-1	23.09	10M	0.3072	13.08	538		394.46	35.42
VidRD	23.09	5.3M					363.19	39.37
Dysen-VDM	23.11	10M	0.3204	12.64			325.42	35.57
W.A.L.T	23.12	~~89M~~					258.10	35.10
VideoPoet	23.12	~~270M~~	0.3049 / 0.3123		213		355.00	38.44
PYoCo	ICCV23 23.05	~~22.5M~~		9.73 / 22.14			355.19	47.76
Make-Your-Video	23.06	10M					330.49
PixelDance	23.11	~~10.5M~~	0.3125		381	49.36	242.82	42.10
SVD	23.11	~~152M~~					242.02

Bold indicates open-source code or demo release.

~~Strikethrough~~ indicates private data involved.

Dataset summary

Name	Size	Type	Date	Affiliation
UCF-101	13k	class	2013	UCF
MSR-VTT	10K	text	CVPR16	Microsoft
Kinetics	650k	class	CVPR17	DeepMind
HowTo100M	136M	text	ICCV19	ENS
WebVid-10M	10M	text	ICCV21	Oxford
HD-VILA-100M	103M	text	CVPR22	Microsoft
~~HD-VG-130M~~	130M	text	23.05	Microsoft
InternVid	234M (10M)	text	23.07	Shanghai AI Lab
~~Vimeo25M~~	25M	text	23.09	Shanghai AI Lab

~~Strikethrough~~ indicates not yet released.

UCF-101: 320x240 25fps

MSR-VTT: resize to 320x240 30fps

Evaluation protocol

eval CLIPSIM, FID, FVD on MSR-VTT, FVD, IS on UCF-101

CLIP similarity (CLIPSIM): clipscore (TorchMetrics) CLIP ViT-B/32, ViT-B/16
Frechet inception distance (FID): pytorch-fid (TorchMetrics) CLIP ViT-B/32
Frechet video distance (FVD): TATS (LVDM, stylegan-v) I3D Kinetics-400
Inception score (IS): tgan2 (TorchMetrics, stylegan-v) C3D UCF-101

Table for #evaluation samples and backbone

	MSR-VTT CLIPSIM	MSR-VTT FVD	MSR-VTT FID	UCF-101 IS	UCF-101 FVD
CogVideo	-	-	-	10k	2048
Video LDM	2990 CLIP32	-	-		10k
VideoComposer			-	-	-
InternVid	2990 CLIP32	-	-	2020	2020
Make-A-Video	59794	-	59794	10k	10k
VideoPoet	59794 CLIP16/CLIP32	40960	-	10k	10k training
PYoCo	-	-	59794 CLIP32/Inception	2020	2048
SVD	-	-	-	-	13320 script 240x320

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LLM.bib		LLM.bib
README.md		README.md
T2V.bib		T2V.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Text-to-Video Generation

Paper summary

Zero-shot leaderboard

Dataset summary

Evaluation protocol

About

Releases

Packages

Languages

jy0205/Awesome-Text-to-Video-Generation

Folders and files

Latest commit

History

Repository files navigation

Awesome Text-to-Video Generation

Paper summary

Zero-shot leaderboard

Dataset summary

Evaluation protocol

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages