better view here -> https://egocentricvision.github.io/EgocentricVision/
-
Deep-learning Based Egocentric Action Anticipation: A Survey - Richard Wardle, Sareh Rowlands, Machine Vision and Applications 2023.
-
An Outlook into the Future of Egocentric Vision - Chiara Plizzari, Gabriele Goletto, Antonino Furnari, Siddhant Bansal, Francesco Ragusa, Giovanni Maria Farinella, Dima Damen, Tatiana Tommasi, 2023.
-
A Survey on Deep Learning Techniques for Action Anticipation - Zeyun Zhong, Manuel Martin, Michael Voit, Juergen Gall, Jürgen Beyerer, 2023.
-
Egocentric Vision-based Action Recognition: A survey - Adrián Núñez-Marcos, Gorka Azkune, Ignacio Arganda-Carreras, Neurocomputing 2021.
-
Predicting the future from first person (egocentric) vision: A survey - Ivan Rodin, Antonino Furnari, Dimitrios Mavroedis, Giovanni Maria Farinella, CVIU 2021.
-
Analysis of the hands in egocentric vision: A survey - Andrea Bandini, José Zariffa, TPAMI 2020.
-
A survey of activity recognition in egocentric lifelogging datasets - El Asnaoui Khalid, Aksasse Hamid, Aksasse Brahim, Ouanan Mohammed, WITS 2017.
-
Summarization of Egocentric Videos: A Comprehensive Survey - Ana Garcia del Molino, Cheston Tan, Joo-Hwee Lim, Ah-Hwee Tan, THMS 2017.
-
Recognition of Activities of Daily Living with Egocentric Vision: A Review - Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, Sensors 2016.
-
The Evolution of First Person Vision Methods: A Survey - Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, Matthias Rauterberg, TCSVT 2015.
-
Context in Human Action Through Motion Complementarity - Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos, WACV 2024.
-
Exploring Missing Modality in Multimodal Egocentric Datasets - Merey Ramazanova, Alejandro Pardo, Humam Alwassel, Bernard Ghanem, 2024.
-
Semantic-Disentangled Transformer With Noun-Verb Embedding for Compositional Action Recognition - Peng Huang, Rui Yan, Xiangbo Shu, Zhewei Tu, Guangzhao Dai, Jinhui Tang, TIP 2023.
-
Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition - Guangzhao Dai, Xiangbo Shu, Rui Yan, Peng Huang, Jinhui Tang, MM 2023.
-
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition - Qitong Wang, Long Zhao, Liangzhe Yuan, Ting Liu, Xi Peng, ICCV 2023. [code]
-
Use Your Head: Improving Long-Tail Video Recognition - Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen, CVPR 2023. [project page]
-
How Can Objects Help Action Recognition? - Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid, CVPR 2023. [code]
-
Free-Form Composition Networks for Egocentric Action Recognition - Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling, 2023.
-
Integrating Human Gaze Into Attention for Egocentric Activity Recognition - Kyle Min, Jason J. Corso, WACV 2021.
-
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries - Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz, T-PAMI 2021.
-
Interactive Prototype Learning for Egocentric Action Recognition - Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang, ICCV 2021.
-
Slow-Fast Auditory Streams For Audio Recognition - Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, ICASSP 2021.
-
ACTION-Net: Multipath Excitation for Action Recognition - Zhengwei Wang, Qi She, Aljosa Smolic, CVPR 2021. [code]
-
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips - Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato, BMVC 2021.
-
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition - Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen, BMVC 2021.
-
Trear: Transformer-based RGB-D Egocentric Action Recognition - Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li, TCDS 2020.
-
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition - Mirco Planamente, Andrea Bottino, Barbara Caputo, ICPR 2020.
-
Gate-Shift Networks for Video Action Recognition - Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz, CVPR 2020. [code]
-
Learning Spatiotemporal Attention for Egocentric Action Recognition - Minlong Lu, Danping Liao, Ze-Nian Li, WICCV 2019.
-
Multitask Learning to Improve Egocentric Action Recognition - Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp, WICCV 2019.
-
Seeing and Hearing Egocentric Actions: How Much Can We Learn? - Alejandro Cartas, Jordi Luque, Petia Radeva, Carlos Segura, Mariella Dimiccoli, WICCV 2019.
-
Deep Attention Network for Egocentric Action Recognition - Minlong Lu, Simon Fraser, Ze-Nian Li, Yueming Wang, Gang Pan, TIP 2019.
-
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019. [code]
-
LSTA: Long Short-Term Attention for Egocentric Action Recognition - Sudhakaran, Swathikiran and Escalera, Sergio and Lanz, Oswald, CVPR 2019. [code]
-
Long-Term Feature Banks for Detailed Video Understanding - Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick, CVPR 2019.
-
In the eye of beholder: Joint learning of gaze and actions in first person video - Li, Y., Liu, M., & Rehg, J. M., ECCV 2018.
-
Egocentric Activity Recognition on a Budget - Possas, Rafael and Caceres, Sheila Pinto and Ramos, Fabio, CVPR 2018.
-
Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition - Swathikiran Sudhakaran, Oswald Lanz, BMVC 2018.
-
Trajectory Aligned Features For First Person Action Recognition - S. Singh, C. Arora, and C.V. Jawahar, Pattern Recognition 2017.
-
Action recognition in RGB-D egocentric videos - Yansong Tang, Yi Tian, Jiwen Lu, Jianjiang Feng, Jie Zhou, ICIP 2017.
-
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules - Cao, Congqi and Zhang, Yifan and Wu, Yi and Lu, Hanqing and Cheng, Jian, ICCV 2017.
-
Modeling Sub-Event Dynamics in First-Person Action Recognition - Hasan F. M. Zaki, Faisal Shafait, Ajmal Mian, CVPR 2017.
-
First Person Action Recognition Using Deep Learned Descriptors - S. Singh, C. Arora, and C.V. Jawahar, CVPR 2016. [code]
-
Generating Notifications for Missing Actions: Don't forget to turn the lights off! - Soran, Bilge, Ali Farhadi, and Linda Shapiro, ICCV 2015.
-
Delving into egocentric actions - Li, Y., Ye, Z., & Rehg, J. M., CVPR 2015.
-
Pooled Motion Features for First-Person Videos - Michael S. Ryoo, Brandon Rothrock and Larry H. Matthies, CVPR 2015.
-
First-Person Activity Recognition: What Are They Doing to Me? - M. S. Ryoo and L. Matthies, CVPR 2013.
-
Learning to recognize daily actions using gaze - Fathi, A., Li, Y., & Rehg, J. M, ECCV 2012.
-
Detecting activities of daily living in first-person camera views - Pirsiavash, H., & Ramanan, D., CVPR 2012.
-
Egocentric Action Recognition by Capturing Hand-Object Contact and Object State - Tsukasa Shiota, Motohiro Takagi, Kaori Kumagai, Hitoshi Seshimo, Yushi Aono, WACV 2024.
-
Sparse multi-view hand-object reconstruction for unseen environments - Yik Lung Pang, Changjae Oh, Andrea Cavallaro, CVPRW 2024.
-
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation - Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato, CVPR 2024. [code]
-
HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields - Haozhe Qi, Chen Zhao, Mathieu Salzmann, Alexander Mathis, CVPR 2024. [code]
-
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation - Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato, CVPR 2024. [code]
-
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects - Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao, 2024.
-
Text-driven Affordance Learning from Egocentric Vision - Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori, 2024.
-
Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos - Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato, WACV 2023. [project page]
-
InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild - Yanyan Shao, Qi Ye, Wenhan Luo, Kaihao Zhang, Jiming Chen, IROS 2023.
-
Hands, Objects, Action! Egocentric 2D Hand-Based Action Recognition - Wiktor Mucha, Martin Kampel, ICVS 2023.
-
Improved Deep Learning-Based Efficientpose Algorithm for Egocentric Marker-Less Tool and Hand Pose Estimation in Manual Assembly - Zihan Niu, Yi Xia, Jun Zhang, Bing Wang, Peng Chen, ICIC 2023.
-
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model - Chuhan Zhang, Ankush Gupta, Andrew Zisserman, ICCV 2023.
-
Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images - Tze Ho Elden Tse, Franziska Mueller, Zhengyang Shen, Danhang Tang, Thabo Beeler, Mingsong Dou, Yinda Zhang, Sasa Petrovic, Hyung Jin Chang, Jonathan Taylor, Bardia Doosti, ICCV 2023.
-
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding - Yue Xu, Yong-Lu Li, Zhemin Huang, Michael Xu Liu, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang, ICCV 2023. [project page]
-
HiFiHR: Enhancing 3D Hand Reconstruction from a Single Image via High-Fidelity Texture - Jiayin Zhu, Zhuoran Zhao, Linlin Yang, Angela Yao, DAGM 2023. [code]
-
Functional Hand Type Prior for 3D Hand Pose Estimation and Action Recognition from Egocentric View Monocular Videos - Wonseok Roh, Seung Hyun Lee, Won Jeong Ryoo, Jakyung Lee, Gyeongrok Oh, Sooyeon Hwang, Hyung-gun Chi, Sangpil Kim, BMVC 2023.
-
CaSAR: Contact-aware Skeletal Action Recognition - Junan Lin, Zhichao Sun, Enjie Cao, Taein Kwon, Mahdi Rad, Marc Pollefeys, 2023.
-
MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians - Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar, 2023. [project page]
-
3D Hand Pose Estimation in Egocentric Images in the Wild - Aditya Prakash, Ruisen Tu, Matthew Chang, Saurabh Gupta, 2023. [project page]
-
Get a Grip: Reconstructing Hand-Object Stable Grasps in Egocentric Videos - Zhifan Zhu, Dima Damen, 2023. [project page]
-
MACS: Mass Conditioned 3D Hand and Object Motion Synthesis - Soshi Shimada, Franziska Mueller, Jan Bednarik, Bardia Doosti, Bernd Bickel, Danhang Tang, Vladislav Golyanik, Jonathan Taylor, Christian Theobalt, Thabo Beeler, 2023. [project page]
-
Egocentric Human-Object Interaction Detection Exploiting Synthetic Data - Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella, ICIAP 2022.
-
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition - Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez, ECCV 2022.
-
Is First Person Vision Challenging for Object Tracking? - Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni, WICCV 2021.
-
Real Time Egocentric Object Segmentation: THU-READ Labeling and Benchmarking Results - E. Gonzalez-Sosa, G. Robledo, D. Gonzalez-Morin, P. Perez-Garcia, A. Villegas, WCVPR 2021.
-
The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain - Francesco Ragusa, Antonino Furnari, Salvatore Livatino, Giovanni Maria Farinella, WACV 2021.
-
Learning Visual Affordance Grounding from Demonstration Videos - Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao, 2021.
-
Domain and View-point Agnostic Hand Action Recognition - Alberto Sabater, Iñigo Alonso, Luis Montesano, Ana C. Murillo, 2021.
-
Understanding Egocentric Hand-Object Interactions from Hand Estimation - Yao Lu, Walterio W. Mayol-Cuevas, 2021.
-
Egocentric Hand-object Interaction Detection and Application - Yao Lu, Walterio W. Mayol-Cuevas, 2021.
-
Hand-Priming in Object Localization for Assistive Egocentric Vision - Lee, Kyungjun and Shrivastava, Abhinav and Kacorri, Hernisa, WACV 2020.
-
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020.
-
Understanding Human Hands in Contact at Internet Scale - Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey, CVPR 2020.
-
Generalizing Hand Segmentation in Egocentric Videos with Uncertainty-Guided Model Adaptation - Minjie Cai and Feng Lu and Yoichi Sato, CVPR 2020. [code]
-
Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild - Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, Stefanos Zafeiriou, CVPR 2020.
-
Learning joint reconstruction of hands and manipulated objects - Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid, CVPR 2020.
-
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions - Tekin, Bugra and Bogo, Federica and Pollefeys, Marc, CVPR 2019. [video]
-
Understanding Hand-Object Manipulation with Grasp Types and Object Attributes - Minjie Cai and Kris M. Kitani and Yoichi Sato, Robotics: Science and Systems 2018.
-
From Lifestyle VLOGs to Everyday Interaction - David F. Fouhey and Weicheng Kuo and Alexei A. Efros and Jitendra Malik, CVPR 2018.
-
Analysis of Hand Segmentation in the Wild - Aisha Urooj, Ali Borj, CVPR 2018.
-
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Garcia-Hernando, Guillermo and Yuan, Shanxin and Baek, Seungryul and Kim, Tae-Kyun, CVPR 2018. [code]
-
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos - Liu, Yang and Wei, Ping and Zhu, Song-Chun, ICCV 2017.
-
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules - Cao, Congqi and Zhang, Yifan and Wu, Yi and Lu, Hanqing and Cheng, Jian, ICCV 2017.
-
First Person Action-Object Detection with EgoNet - Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi, 2017.
-
Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions - Bambach, S., Lee, S., Crandall, D. J., & Yu, C., ICCV 2015.
-
Understanding Everyday Hands in Action From RGB-D Images - Gregory Rogez, James S. Supancic III, Deva Ramanan, ICCV 2015.
-
3D Hand Pose Detection in Egocentric RGB-D Images - Grégory Rogez, Maryam Khademi, J. S. Supančič III, J. M. M. Montiel, Deva Ramanan, WECCV 2014.
-
Detecting Snap Points in Egocentric Video with a Web Photo Prior - Bo Xiong and Kristen Grauman, ECCV 2014. [code]
-
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video - Dima Damen, Teesid Leelasawassuk, Osian Haines, Andrew Calway, and Walterio Mayol-Cuevas, BMVC 2014.
-
Pixel-level hand detection in ego-centric videos - Li, Cheng, Kris M. Kitani, CVPR 2013. [code] [video]
-
Learning to recognize objects in egocentric activities - Fathi, A., Ren, X., & Rehg, J. M., CVPR 2011.
-
Context-based vision system for place and object recognition - Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., ICCV 2003.
-
Relative Norm Alignment for Tackling Domain Shift in Deep Multi-modal Classification - Mirco Planamente, Chiara Plizzari, Simone Alberto Peirone, Barbara Caputo, Andrea Bottino, IJCV 2024.
-
Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective - Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin, NeurIPS 2023. [code]
-
Object-based (yet Class-agnostic) Video Domain Adaptation - Dantong Niu, Amir Bar, Roei Herzig, Trevor Darrell, Anna Rohrbach, 2023.
-
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo, WACV 2022.
-
Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition - Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato, CVPR 2022.
-
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing - Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das, NIPS 2021.
-
Differentiated Learning for Multi-Modal Domain Adaptation - Jianming Lv, Kaijie Liu, Shengfeng He, MM 2021.
-
Learning Cross-modal Contrastive Features for Video Domain Adaptation - Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker, ICCV 2021.
-
Spatio-temporal Contrastive Domain Adaptation for Action Recognition - Xiaolin Song, Sicheng Zhao, Jingyu Yang, Huanjing Yue, Pengfei Xu, Runbo Hu, Hua Chai, CVPR 2021.
-
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval - Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen, 2021.
-
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition - Jonathan Munro, Dima Damen, CVPR 2020.
-
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization - Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma, CVPR 2024. [code]
-
MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition - Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, Yilei Li, Zhangyang Wang, Rakesh Ranjan, CVPR 2023.
-
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo, WACV 2022.
-
Combating Missing Modalities in Egocentric Videos at Test Time - Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, Motasem Alfarra, 2024.
-
Test-time adaptation for egocentric action recognition - Mirco Plananamente, Chiara Plizzari, Barbara Caputo, ICIAP 2022. [code]
-
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition - Guangzhao Dai, Xiangbo Shu, Wenhao Wu, 2024.
-
Opening the Vocabulary of Egocentric Actions - Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao, 2023. [project page]
-
Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos - Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue, WACV 2024.
-
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation - Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang, CVPR 2024.
-
Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models - Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee, CVPR 2024.
-
On the Efficacy of Text-Based Input Modalities for Action Anticipation - Apoorva Beedu, Karan Samel, Irfan Essa, 2024.
-
SHARE ON Towards Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing - Tianyu Zhang , Weiqing Min, Tao Liu, Shuqiang Jiang, Yong Rui, TOMM 2023.
-
Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks - Xinyu Xu, Yong-Lu Li, Cewu Lu, IJCV 2023.
-
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention - Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue, ICIP 2023. [project page]
-
Guided Attention for Next Active Object @ EGO4D STA Challenge - Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue, CVPRW 2023.
-
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction - Alexandros Stergiou, Dima Damen, CVPR 2023. [project page]
-
Streaming egocentric action anticipation: An evaluation scheme and approach - Antonino Furnari, Giovanni Maria Farinella, CVIU 2023.
-
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction - Razvan-George Pasca, Alexey Gavryushin, Yen-Ling Kuo, Luc Van Gool, Otmar Hilliges, Xi Wang, 2023. [project page]
-
VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation - Congqi Cao, Ze Sun, Qinyi Lv, Lingtong Min, Yanning Zhang, 2023.
-
Action Scene Graphs for Long-Form Understanding of Egocentric Videos - Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella, 2023. [code]
-
DiffAnt: Diffusion Models for Action Anticipation - Zeyun Zhong, Chengzhi Wu, Manuel Martin, Michael Voit, Juergen Gall, Jürgen Beyerer, 2023.
-
Early Action Recognition with Action Prototypes - Guglielmo Camporese, Alessandro Bergamo, Xunyu Lin, Joseph Tighe, Davide Modolo, 2023.
-
A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting - Tianshan Liu, Kin-Man Lam, CVPR 2022.
-
Towards Streaming Egocentric Action Anticipation - Antonino Furnari, Giovanni Maria Farinella, arXiv 2021.
-
Action Anticipation Using Pairwise Human-Object Interactions and Transformers - Debaditya Roy; Basura Fernando, TIP 2021.
-
Self-Regulated Learning for Egocentric Video Activity Anticipation - Zhaobo Qi; Shuhui Wang; Chi Su; Li Su; Qingming Huang; Qi Tian, T-PAMI 2021.
-
What If We Could Not See? Counterfactual Analysis for Egocentric Action Anticipation - T Zhang, W Min, J Yang, T Liu, S Jiang, Y Rui, IJCAI 2021.
-
What If We Could Not See? Counterfactual Analysis for Egocentric Action Anticipation - Tianyu Zhang, Weiqing Min, Jiahao Yang, Tao Liu, Shuqiang Jiang, Yong Rui, IJCAI 2021.
-
Knowledge Distillation for Human Action Anticipation - Vinh Tran Stony Brook University, Stony Brook NY, Yang Wang, Zekun Zhang, Minh Hoai, ICIP 2021.
-
Anticipative Video Transformer - Rohit Girdhar, Kristen Grauman, ICCV 2021.
-
Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos - Olga Zatsarynna, Yazan Abu Farha, Juergen Gall, CVPRW 2021.
-
Anticipating Human Actions by Correlating Past With the Future With Jaccard Similarity Measures - Basura Fernando, Samitha Herath, CVPR 2021.
-
Higher Order Recurrent Space-Time Transformer for Video Action Prediction - Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Oswald Lanz, ArXiv 2021.
-
Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation - Y Huang, X Yang, C Xu, ACM 2021.
-
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video - Antonino Furnari, Giovanni Maria Farinella, T-PAMI 2020.
-
Knowledge Distillation for Action Anticipation via Label Smoothing - Guglielmo Camporese, Pasquale Coscia, Antonino Furnari, Giovanni Maria Farinella, Lamberto Ballan, ICPR 2020.
-
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020.
-
An Egocentric Action Anticipation Framework via Fusing Intuition and Analysis - Tianyu Zhang, Weiqing Min, Ying Zhu, Yong Rui, Shuqiang Jiang, ACM 2020.
-
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention - Antonino Furnari, Giovanni Maria Farinella, ICCV 2019. [code]
-
Zero-Shot Anticipation for Instructional Activities - Fadime Sener, Angela Yao, ICCV 2019.
-
Leveraging the Present to Anticipate the Future in Videos - Antoine Miech, Ivan Laptev, Josef Sivic, Heng Wang, Lorenzo Torresani, Du Tran, CVPRW 2019.
-
Object-centric Video Representation for Long-term Action Anticipation - Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun, WACV 2024. [code]
-
Intention-Conditioned Long-Term Human Egocentric Action Anticipation - Esteve Valls Mascaro´, Hyemin Ahn, Dongheui Lee, WACV 2023.
-
Multiscale Video Pretraining for Long-Term Activity Forecasting - Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani, 2023.
-
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? - Qi Zhao, Ce Zhang, Shijie Wang, Changcheng Fu, Nakul Agarwal, Kwonjoon Lee, Chen Sun, 2023. [project page]
-
Rethinking Learning Approaches for Long-Term Action Anticipation - Megha Nawhal, Akash Abdu Jyothi, Greg Mori, ECCV 2022. [project page]
-
Learning to Anticipate Egocentric Actions by Imagination - Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu, TIP 2021.
-
On Diverse Asynchronous Activity Anticipation - He Zhao and Richard P. Wildes, ECCV 2020.
-
Time-Conditioned Action Anticipation in One Shot - Qiuhong Ke, Mario Fritz, Bernt Schiele, CVPR 2019.
-
When Will You Do What? - Anticipating Temporal Occurrences of Activities - Anticipating Temporal Occurrences of Activities](https://openaccess.thecvf.com/content_cvpr_2018/html/Abu_Farha_When_Will_You_CVPR_2018_paper.html) - Yazan Abu Farha, Alexander Richard, Juergen Gall, CVPR 2018.
-
Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos - Tahmida Mahmud, Mahmudul Hasan, Amit K. Roy-Chowdhury, ICCV 2017.
-
First-Person Activity Forecasting with Online Inverse Reinforcement Learning - Nicholas Rhinehart, Kris M. Kitani, ICCV 2017. [video]
-
Unsupervised gaze prediction in egocentric videos by energy-based surprise modeling - Aakur, S.N., Bagavathi, A., ArXiv 2020.
-
Digging Deeper into Egocentric Gaze Prediction - Hamed R. Tavakoli and Esa Rahtu and Juho Kannala and Ali Borji, WACV 2019.
-
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition - Huang, Y., Cai, M., Li, Z., & Sato, Y., ECCV 2018. [code]
-
Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks - Zhang, M., Teck Ma, K., Hwee Lim, J., Zhao, Q., & Feng, J., CVPR 2017. [code]
-
Learning to predict gaze in egocentric video - Li, Yin, Alireza Fathi, and James M. Rehg, ICCV 2013.
-
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos - Junyi Ma, Jingyi Xu, Xieyuanli Chen, Hesheng Wang, 2024.
-
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos - Masashi Hatano, Ryo Hachiuma, Hideo Saito, 2024.
-
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting - Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu Kong, ICCV 2023.
-
Forecasting Action through Contact Representations from First Person Video - Eadom Dessalene; Chinmaya Devaraj; Michael Maynord; Cornelia Fermuller; Yiannis Aloimonos, T-PAMI 2021.
-
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020.
-
How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction - Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhigang Deng, Zhaoqi Wang, ECCV 2020. [video]
-
Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior - Makansi, Osama and Cicek, Ozgun and Buchicchio, Kevin and Brox, Thomas, CVPR 2020. [code]
-
Understanding Human Hands in Contact at Internet Scale - Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey, CVPR 2020.
-
Future Person Localization in First-Person Videos - Takuma Yagi; Karttikeya Mangalam; Ryo Yonetani; Yoichi Sato, CVPR 2018.
-
Egocentric Future Localization - Park, Hyun Soo and Hwang, Jyh-Jing and Niu, Yedong and Shi, Jianbo, CVPR 2016.
-
Going deeper into first-person activity recognition - Ma, M., Fan, H., & Kitani, K. M., CVPR 2016.
-
Interaction Region Visual Transformer for Egocentric Action Anticipation - Debaditya Roy, Ramanathan Rajendiran, Basura Fernando, WACV 2024. [code]
-
Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos - Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang, CVPR 2022. [code] [project page]
-
EGO-TOPO: Environment Affordances from Egocentric Video - Nagarajan, Tushar and Li, Yanghao and Feichtenhofer, Christoph and Grauman, Kristen, CVPR 2020.
-
Forecasting human object interaction: Joint prediction of motor attention and egocentric activity - Liu, M., Tang, S., Li, Y., Rehg, J., arXiv 2019.
-
Forecasting Hands and Objects in Future Frames - Chenyou Fan, Jangwon Lee, Michael S. Ryoo, ECCVW 2018.
-
Next-active-object prediction from egocentric videos - Antonino Furnari, Sebastiano Battiato, Kristen Grauman, Giovanni Maria Farinella, JVCIR 2017.
-
Unsupervised Learning of Important Objects From First-Person Videos - Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi, ICCV 2017.
-
First Person Action-Object Detection with EgoNet - G Bertasius, HS Park, SX Yu, J Shi, arXiv 2016.
-
TIM: A Time Interval Machine for Audio-Visual Action Recognition - Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen, CVPR 2024. [project page]
-
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos - Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman, CVPR 2024. [project page]
-
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective - Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao, CVPR 2024. [project page]
-
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos - Sagnik Majumder, Ziad Al-Halah, Kristen Grauman, CVPR 2024. [project page]
-
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos - Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman, CVPR 2024.
-
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection - Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett, WBMVC 2023. [code]
-
Multimodal Distillation for Egocentric Action Recognition - Gorjan Radevski, Dusan Grujicic, Marie-Francine Moens, Matthew Blaschko, Tinne Tuytelaars, ICCV 2023. [code]
-
Egocentric Audio-Visual Object Localization - Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu, CVPR 2023. [code]
-
Audio Visual Speaker Localization from EgoCentric Views - Jinzheng Zhao, Yong Xu, Xinyuan Qian, Wenwu Wang, 2023. [code]
-
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo, WACV 2022.
-
Attention Bottlenecks for Multimodal Fusion - Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun, NIPS 2021.
-
Slow-Fast Auditory Streams For Audio Recognition - Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, ICASSP 2021.
-
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition - Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen, BMVC 2021.
-
Multi-modal Egocentric Activity Recognition using Audio-Visual Features - Mehmet Ali Arabacı, Fatih Özkan, Elif Surer, Peter Jančovič, Alptekin Temizel, MTA 2020.
-
Seeing and Hearing Egocentric Actions: How Much Can We Learn? - Alejandro Cartas, Jordi Luque, Petia Radeva, Carlos Segura, Mariella Dimiccoli, WICCV 2019.
-
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019.
-
Multimodal Score Fusion with Sparse Low Rank Bilinear Pooling for Egocentric Hand Action Recognition - Kankana Roy, TOMM 2024.
-
Egocentric RGB+Depth Action Recognition in Industry-Like Settings - Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah, 2023.
-
Egocentric Scene Understanding via Multimodal Spatial Rectifier - Tien Do, Khiem Vuong, Hyun Soo Park, CVPR 2022. [code]
-
Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes - Dan Xu, Andrea Vedaldi, João F. Henriques, IROS 2021.
-
Trear: Transformer-based RGB-D Egocentric Action Recognition - Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li, TCDS 2020.
-
Multi-stream Deep Neural Networks for RGB-D Egocentric Action Recognition - Yansong Tang, Zian Wang, Jiwen Lu, Jianjiang Feng, Jie Zhou, TCSVT 2018.
-
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Garcia-Hernando, Guillermo and Yuan, Shanxin and Baek, Seungryul and Kim, Tae-Kyun, CVPR 2018. [code]
-
Action recognition in RGB-D egocentric videos - Yansong Tang, Yi Tian, Jiwen Lu, Jianjiang Feng, Jie Zhou, ICIP 2017.
-
Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos - Rachel Luo, Ozan Sener, Silvio Savarese, 3DV 2017.
-
3D Hand Pose Detection in Egocentric RGB-D Images - Grégory Rogez, Maryam Khademi, J. S. Supančič III, J. M. M. Montiel, Deva Ramanan, WECCV 2014.
- Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos - Rachel Luo, Ozan Sener, Silvio Savarese, 3DV 2017.
-
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams - Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik, CVPR 2024. [project page]
-
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams - Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik, CVPR 2024. [project page]
-
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams - Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik, CVPR 2024. [project page]
-
EventTransAct: A video transformer-based framework for Event-camera based action recognition - Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, Mubarak Shah, IROS 2023. [project page]
-
E(GO)^2MOTION: Motion Augmented Event Stream for Egocentric Action Recognition - Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo, CVPR 2022.
-
Continual Egocentric Activity Recognition with Foreseeable-Generalized Visual-IMU Representations - Chiyuan He, Shaoxu Cheng, Zihuan Qiu, Linfeng Xu, Fanman Meng, Qingbo Wu, Hongliang Li, IEEE Sensors Journal 2024.
-
How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors - Satoshi Tsutsui, Ruta Desai, Karl Ridgeway, WICCV 2021.
-
ActionMixer: Temporal action detection with Optimal Action Segment Assignment and mixers - Jianhua Yang, Ke Wang, Lijun Zhao, Zhiqiang Jiang, Ruifeng Li, Expert Systems with Applications 2024.
-
FACT: Frame-Action Cross-Attention Temporal - Zijia Lu, Ehsan Elhamifar, CVPR 2024. [code]
-
https://openaccess.thecvf.com//content/CVPR2024/html/Shen_Progress-Aware_Online_Action_Segmentation_for_Egocentric_Procedural_Task_Videos_CVPR_2024_paper - Yuhan Shen, Ehsan Elhamifar, CVPR 2024. [code]
-
Refining Action Boundaries for One-stage Detection - Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett, AVSS 2024.
-
Quasi-Online Detection of Take and Release Actions from Egocentric Videos - Rosario Scavo, Francesco Ragusa, Giovanni Maria Farinella, Antonino Furnari, ICIAP 2023. [code]
-
Memory-and-Anticipation Transformer for Online Action Understanding - Jiahao Wang, Guo Chen, Yifei Huang, Limin Wang, Tong Lu, ICCV 2023. [code]
-
Ego-Only: Egocentric Action Detection without Exocentric Transferring - Huiyu Wang, Mitesh Kumar Singh, Lorenzo Torresani, ICCV 2023.
-
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs - Camillo Quattrocchi, Antonino Furnari, Daniele Di Mauro, Mario Valerio Giuffrida, Giovanni Maria Farinella, 2023. [code]
-
My View is the Best View: Procedure Learning from Egocentric Videos - Siddhant Bansal, Chetan Arora, C.V. Jawahar, ECCV 2022.
-
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos - Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie Zhou, Jiwen Lu, CVPR 2022. [code]
-
Temporal Action Segmentation from Timestamp Supervision - Zhe Li, Yazan Abu Farha, Jurgen Gall, CVPR 2021.
-
UnweaveNet: Unweaving Activity Stories - Will Price, Carl Vondrick, Dima Damen, 2021.
-
Personal-Location-Based Temporal Segmentation of Egocentric Video for Lifelogging Applications - A. Furnari, G. M. Farinella, S. Battiato, Journal of Visual Communication and Image Representation 2017.
-
Temporal segmentation and activity classification from first-person sensing - Spriggs, Ekaterina H., Fernando De La Torre, and Martial Hebert, Computer Vision and Pattern Recognition Workshops, WCVPR 2009.
-
Retrieval-Augmented Egocentric Video Captioning - Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie, CVPR 2024.
-
Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization - Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Pérez-Rúa, CVPR 2023. [[code]](https: //github.com/facebookresearch/vq2d_cvpr)
-
Learning Temporal Sentence Grounding From Narrated EgoVideos - Kevin Flanagan, Dima Damen, Michael Wray, BMVC 2023. [code]
-
Single-Stage Visual Query Localization in Egocentric Videos - Hanwen Jiang, Santhosh Kumar Ramakrishnan, Kristen Grauman, 2023. [project page]
-
On Semantic Similarity in Video Retrieval - Michael Wray, Hazel Doughty, Dima Damen, CVPR 2021.
-
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval - Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen, 2021.
-
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings - Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen, ICCV 2019.
-
Learning to Segment Referred Objects from Narrated Egocentric Videos - Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi;, CVPR 2024.
-
Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? An Investigation and the HOI-Synth Domain Adaptation Benchmark - Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella, 2023. [project page]
-
Generative Adversarial Network for Future Hand Segmentation from Egocentric Video - Wenqi Jia, Miao Liu, James M. Rehg, ECCV 2022. [project page]
-
SLVP: Self-Supervised Language-Video Pre-Training for Referring Video Object Segmentation - Jie Mei, AJ Piergiovanni, Jenq-Neng Hwang, Wei Li, WACVW 2024.
-
A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval - Andreea-Maria Oncescu, João F. Henriques, Andrew Zisserman, Samuel Albanie, A. Sophia Koepke, ICASSP 2024.
-
Detours for Navigating Instructional Videos - Kumar Ashutosh, Zihui Xue, Tushar Nagarajan, Kristen Grauman, CVPR 2024. [project page]
-
Grounded Question-Answering in Long Egocentric Videos - Shangzhe Di, Weidi Xie;, CVPR 2024. [code]
-
Video ReCap: Recursive Captioning of Hour-Long Videos - Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius, CVPR 2024. [project page]
-
Learning Object States from Actions via Large Language Models - Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato, 2024.
-
Step Differences in Instructional Video - Tushar Nagarajan, Lorenzo Torresani, 2024.
-
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? - Boshen Xu, Ziheng Wang, Yang Du, Sipeng Zheng, Zhinan Song, Qin Jin, 2024. [code]
-
HERO: A Multi-modal Approach on Mobile Devices for Visual-Aware Conversational Assistance in Industrial Domains - Claudia Bonanno, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella, ICIAP 2023.
-
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone - Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang, ICCV 2023. [project page]
-
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge - Te-Lin Wu, Yu Zhou, Nanyun Peng, EMNLP 2023.
-
NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory - Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman, CVPR 2023. [project page]
-
HierVL: Learning Hierarchical Video-Language Embeddings - Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman, CVPR 2023. [project page]
-
Learning Video Representations from Large Language Models - Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar, CVPR 2023. [code] [project page]
-
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning - Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu, 2023.
-
Grounded Question-Answering in Long Egocentric Videos - Shangzhe Di, Weidi Xie, 2023.
-
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos - Tomáš Souček, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic, 2023. [project page]
-
EgoTaskQA: Understanding Human Tasks in Egocentric Videos - Baoxiong Jia, Ting Lei, Song-Chun Zhu, Siyuan Huang, NeurIPS 2022. [project page]
-
Episodic Memory Question Answering - Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, CVPR 2022.
-
Egocentric Video-Language Pretraining - Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rongcheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, 2022.
- Unifying Few- and Zero-Shot Egocentric Action Recognition - Tyler R. Scott, Michael Shvartsman, Karl Ridgeway, CVPRW 2021.
-
In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond - Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg, IJCV 2023. [project page]
-
1000 Pupil Segmentations in a Second Using Haar Like Features and Statistical Learning - Wolfgang Fuhl, Johannes Schneider, Enkelejda Kasneci, WICCV 2021.
-
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective - Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao, CVPR 2024. [project page]
-
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos - Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman, 2024.
-
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment - Zihui Xue, Kristen Grauman, UnderReview 2023. [project page]
-
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World - Boshen Xu, Sipeng Zheng, Qin Jin, MM 2023.
-
Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos - Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman, CVPR 2021.
-
Making Third Person Techniques Recognize First-Person Actions in Egocentric Videos - Sagar Verma, Pravin Nagar, Divam Gupta, Chetan Arora, ICIP 2018.
-
Actor and Observer: Joint Modeling of First and Third-Person Videos - Gunnar A. Sigurdsson and Abhinav Gupta and Cordelia Schmid and Ali Farhadi and Karteek Alahari, CVPR 2018. [code]
- Balanced Spherical Grid for Egocentric View Synthesis - Changwoon Choi, Sang Min Kim, Young Min Kim, CVPR 2023. [project page]
-
Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement - Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt, CVPR 2024. [project page]
-
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting - Taeho Kang, Youngki Lee, CVPR 2024. [code]
-
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams - Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik, CVPR 2024. [project page]
-
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting - Taeho Kang, Youngki Lee, CVPR 2024. [code]
-
SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras - Hanz Cuevas-Velasquez, Charlie Hewitt, Sadegh Aliakbarian, Tadas Baltrušaitis, 3DV 2024. [project page]
-
Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views - Taeho Kang, Kyungjin Lee, Jinrui Zhang, Youngki Lee, SIGGRAPH 2023.
-
Domain-Guided Spatio-Temporal Self-Attention for Egocentric 3D Pose Estimation - Jinman Park, Kimathi Kaai, Saad Hossain, Norikatsu Sumi, Sirisha Rambhatla, Paul Fieguth, KDD 2023. [code]
-
Scene-aware Egocentric 3D Human Pose Estimation - Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, Christian Theobalt, CVPR 2023. [project page]
-
Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition From Egocentric RGB Videos - Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, Wenping Wang, CVPR 2023. [code]
-
3D Human Pose Perception from Egocentric Stereo Videos - Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt, 2023. [project page]
-
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality - Amin Jourabloo, Fernando De la Torre, Jason Saragih, Shih-En Wei, Stephen Lombardi, Te-Li Wang, Danielle Belko, Autumn Trimble, Hernan Badino, CVPR 2022.
-
Whose Hand Is This? Person Identification From Egocentric Hand Gestures - Satoshi Tsutsui, Yanwei Fu, David J. Crandall, WACV 2021.
-
Dynamics-regulated kinematic policy for egocentric pose estimation - Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani, NIPS 2021.
-
Estimating Egocentric 3D Human Pose in Global Space - Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Christian Theobalt, ICCV 2021.
-
Egocentric Pose Estimation From Human Vision Span - Hao Jiang, Vamsi Krishna Ithapu, ICCV 2021.
-
EgoRenderer: Rendering Human Avatars From Egocentric Camera Images - Tao Hu, Kripasindhu Sarkar, Lingjie Liu, Matthias Zwicker, Christian Theobalt, ICCV 2021.
-
Recognizing Camera Wearer from Hand Gestures in Egocentric Videos - Daksh Thapar, Aditya Nigam, Chetan Arora, MM 2020.
-
You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions - Ng, Evonne and Xiang, Donglai and Joo, Hanbyul and Grauman, Kristen, CVPR 2020. [code]
-
Ego-Pose Estimation and Forecasting as Real-Time PD Control - Ye Yuan and Kris Kitani, ICCV 2019. [code]
-
xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera - Tome, Denis and Peluse, Patrick and Agapito, Lourdes and Badino, Hernan, ICCV 2019.
-
3D Ego-Pose Estimation via Imitation Learning - Ye Yuan, Kris Kitani, ECCV 2018.
-
Egocentric Activity Recognition and Localization on a 3D Map - Chang Chen, Jiaming Zhang, Kailun Yang, Kunyu Peng, Rainer Stiefelhagen, WACV 2023. [code]
-
Object Goal Navigation with Recursive Implicit Maps - hizhe Chen, Thomas Chabal, Ivan Laptev, Cordelia Schmid, IROS 2023. [project page]
-
An Optimized Pipeline for Image-Based Localization in Museums from Egocentric Images - Nicola Messina, Fabrizio Falchi, Antonino Furnari, Claudio Gennaro, Giovanni Maria Farinella, ICIAP 2023.
-
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries - Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem, ICCV 2023. [code]
-
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations - Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu, CVPR 2023. [project page]
-
InCrowdFormer: On-Ground Pedestrian World Model From Egocentric Views - Mai Nishimura, Shohei Nobuhara, Ko Nishino, 2023.
-
Egocentric Indoor Localization From Room Layouts and Image Outer Corners - Xiaowei Chen, Guoliang Fan, WICCV 2021.
-
Egocentric Activity Recognition and Localization on a 3D Map - Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li, 2021.
-
Egocentric Shopping Cart Localization - E. Spera, A. Furnari, S. Battiato, G. M. Farinella, ICPR 2018.
-
Recognizing personal locations from egocentric videos - Furnari, A., Farinella, G. M., & Battiato, S., IEEE Transactions on Human-Machine Systems 2017.
-
Context-based vision system for place and object recognition - Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., ICCV 2003.
-
Anonymizing Egocentric Videos - Daksh Thapar, Aditya Nigam, Chetan Arora, ICCV 2021.
-
Mitigating Bystander Privacy Concerns in Egocentric Activity Recognition with Deep Learning and Intentional Image Degradation - Dimiccoli, M., Marín, J., & Thomaz, E., Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2018.
-
Privacy-Preserving Human Activity Recognition from Extreme Low Resolution - Ryoo, M. S., Rothrock, B., Fleming, C., & Yang, H. J., AAAI 2017.
-
Instance Tracking in 3D Scenes from Egocentric Videos - Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes, CVPR 2024. [code]
-
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind - Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen, 2024. [project page]
-
Tracking Multiple Deformable Objects in Egocentric Videos - Mingzhen Huang, Xiaoxing Li, Jun Hu, Honghong Peng, Siwei Lyu, CVPR 2023. [project page]
-
LoCoNet: Long-Short Context Network for Active Speaker Detection - Xizi Wang, Feng Cheng, Gedas Bertasius, CVPR 2024. [code]
-
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos - Sagnik Majumder, Ziad Al-Halah, Kristen Grauman, CVPR 2024. [project page]
-
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization - Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu, CVPR 2022.
-
Egocentric Auditory Attention Localization in Conversations - Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu, CVPR 2022. [project page]
-
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset - Curtis G. Northcutt and Shengxin Zha and Steven Lovegrove and Richard Newcombe, PAMI 2020.
-
Deep Dual Relation Modeling for Egocentric Interaction Recognition - Li, Haoxin and Cai, Yijun and Zheng, Wei-Shi, CVPR 2019.
-
Recognizing Micro-Actions and Reactions from Paired Egocentric Videos - Yonetani, Ryo and Kitani, Kris M. and Sato, Yoichi, CVPR 2016.
-
Social interactions: A first-person perspective - Fathi, A., Hodgins, J. K., & Rehg, J. M., CVPR 2012.
-
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives - A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives, CVPR 2024. [project page]
-
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models - Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, Yang Liu, CVPR 2024. [code] [project page]
-
Multi-Task Learning of Object States and State-Modifying Actions from Web Videos - Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic, TPAMI 2023. [code]
-
Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik, CVPR 2023. [video]
-
Egocentric Video Task Translation - Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani, CVPR 2023. [project page]
-
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views - Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao, Zheng-Jun Zha, 2024.
-
Multi-label affordance mapping from egocentric vision - Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin, ICCV 2023.
-
Shaping embodied agent behavior with activity-context priors from egocentric video - Tushar Nagarajan, Kristen Grauman, NIPS 2021.
-
Learning Visual Affordance Grounding from Demonstration Videos - Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao, 2021.
-
EGO-TOPO: Environment Affordances from Egocentric Video - Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman, CVPR 2020.
- HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data - Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang, 2024. [project page]
-
SEMA: Semantic Attention for Capturing Long-Range Dependencies in Egocentric Lifelogs - Pravin Nagar, K.N Ajay Shastry, Jayesh Chaudhari, Chetan Arora, WACV 2024. [code]
-
Behavioural pattern discovery from collections of egocentric photo-streams - Martin Menchon, Estefania Talavera, Jose M Massa, Petia Radeva, Pervasive and Mobile Computing 2023.
-
Multi-stream dynamic video Summarization - Mohamed Elfeki, Liqiang Wang, Ali Borji, WACV 2022.
-
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language - Andy Zeng, Adrian Wong, Stefan Welker, Krzysztof Choromanski, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence, 2022.
-
Egocentric video summarisation via purpose-oriented frame scoring and selection - V. Javier Traver and Dima Damen, 2022.
-
Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos - Abhimanyu Sahu; Ananda S. Chowdhury, TIP 2021.
-
First person video summarization using different graph representations - Abhimanyu Sahu, Ananda S.Chowdhury, Pattern Recognition Letters 2021.
-
Summarizing egocentric videos using deep features and optimal clustering - Abhimanyu Sahu, Ananda S.Chowdhury, Neurocomputing 2020.
-
Text Synopsis Generation for Egocentric Videos - Aidean Sharghi; Niels da Vitoria Lobo; Mubarak Shah, ICPR 2020.
-
Shot Level Egocentric Video Co-summarization - Abhimanyu Sahu; Ananda S. Chowdhury, ICPR 2018.
-
Personalized Egocentric Video Summarization of Cultural Tour on User Preferences Input - Patrizia Varini; Giuseppe Serra; Rita Cucchiara, IEEE Transactions on Multimedia 2017.
-
Discovering Picturesque Highlights from Egocentric Vacation Videos - Vinay Bettadapura, Daniel Castro, Irfan Essa, arXiv 2016.
-
Spatial and temporal scoring for egocentric video summarization - Zhao Guo, Lianli Gao, Xiantong Zhen, Fuhao Zou, Fumin Shen, Kai Zheng, Neurocomputing 2016.
-
Video Summarization with Long Short-term Memory - Ke Zhang, Wei-Lun Chao, Fei Sha, Kristen Grauman, ECCV 2016.
-
Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization - Ting Yao; Tao Mei; Yong Rui, CVPR 2016.
-
Predicting Important Objects for Egocentric Video Summarization - Yong Jae Lee & Kristen Grauman, IJCV 2015.
-
Storyline Representation of Egocentric Videos with an Applications to Story-Based Search - Bo Xiong; Gunhee Kim; Leonid Sigal, ICCV 2015.
-
Gaze-Enabled Egocentric Video Summarization via Constrained Submodular Maximization - Jia Xu, Lopamudra Mukherjee, Yin Li, Jamieson Warner, James M. Rehg, Vikas Singh, CVPR 2015.
-
Video Summarization by Learning Submodular Mixtures of Objectives - Michael Gygli, Helmut Grabner, Luc Van Gool, CVPR 2015.
-
Detecting Snap Points in Egocentric Video with a Web Photo Prior - Bo Xiong and Kristen Grauman, ECCV 2014.
-
Creating Summaries from User Videos - Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool, ECCV 2014.
-
Quasi Real-Time Summarization for Consumer Videos - Bin Zhao, Eric P. Xing, CVPR 2014.
-
Story-Driven Summarization for Egocentric Video - Zheng Lu and Kristen Grauman, CVPR 2013.
-
Discovering Important People and Objects for Egocentric Video Summarization - Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman, CVPR 2012.
-
Wearable hand activity recognition for event summarization - Mayol, W. W., & Murray, D. W., IEEE International Symposium on Wearable Computers, IEEE International Symposium on Wearable Computers 2005.
-
United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos - Siddhant Bansal, Chetan Arora, C.V. Jawahar, WACV 2024.
-
PREGO: online mistake detection in PRocedural EGOcentric videos - Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso, CVPR 2024. [code]
-
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera - Jiye Lee, Hanbyul Joo, CVPR 2024. [code]
-
VideoLLM-online: Online Video Large Language Model for Streaming Video - Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou, CVPR 2024. [video]
-
Error Detection in Egocentric Procedural Task Videos - Shih-Po Lee, Zijia Lu, Zekun Zhang, Minh Hoai, Ehsan Elhamifar, CVPR 2024. [code]
-
Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos - Shijia Feng, Michael Wray, Brian Sullivan, Casimir Ludwig, Iain Gilchrist, Walterio Mayol-Cuevas, 2024.
-
Bringing Online Egocentric Action Recognition into the wild - Gabriele Goletto, Mirco Planamente, Barbara Caputo, Giuseppe Averta, RA-L 2023. [code]
-
EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video - Matthias De Lange, Hamid Eghbalzadeh, Reuben Tan, Michael Iuzzolino, Franziska Meier, Karl Ridgeway, 2023.
-
Training a Large Video Model on a Single Machine in a Day - Yue Zhao, Philipp Krähenbühl, 2023. [code]
-
Learning from One Continuous Video Stream - João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman, 2023.
-
Wearable System for Personalized and Privacy-preserving Egocentric Visual Context Detection using On-device Deep Learning - Mina Khan, Glenn Fernandes, Akash Vaish, Mayank Manuja, Pattie Maes, UMAP 2021.
-
NeuralDiff: Segmenting 3D objects that move in egocentric videos - Vadim Tschernezki, Diane Larlus, Andrea Vedaldi, 3DV 2021.
-
Learning Robot Activities From First-Person Human Videos Using Convolutional Future Regression - Jangwon Lee, Michael S. Ryoo, CVPR 2017.
-
Integrating Egocentric and Robotic Vision for Object Identification Using Siamese Networks and Superquadric Estimations in Partial Occlusion Scenarios - Elisabeth Menendez, Santiago Martínez, Fernando Díaz-de-María, Carlos Balaguer, Intelligent Human-Robot Interaction 2024.
-
Rank2Reward: Learning Shaped Reward Functions from Passive Video - Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal, Abhishek Gupta, ICRA 2024. [project page]
-
Real-time 3D Semantic Scene Perception for Egocentric Robots with Binocular Vision - K. Nguyen, T. Dang, M. Huber, 2024. [code]
-
Learning Interaction Regions and Motion Trajectories Simultaneously From Egocentric Demonstration Videos - Jianjia Xin, Lichun Wang, Kai Xu, Chao Yang, Baocai Yin,, RA-L 2023.
-
Affordances from Human Videos as a Versatile Representation for Robotics - Shikhar Bahl, Russell Mendonca, Lili Chen, Unnat Jain, Deepak Pathak, CVPR 2023. [project page]
-
R3M: A Universal Visual Representation for Robot Manipulation - Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta, 2022.
-
ManipulaTHOR: A Framework for Visual Object Manipulation - Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi, CVPR 2021. [project page]
-
Learning Robot Activities From First-Person Human Videos Using Convolutional Future Regression - Jangwon Lee, Michael S. Ryoo, CVPR 2017.
-
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning - Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine, RSS 2014.
-
TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model - Wiktor Mucha, Florin Cuconasu, Naome A. Etori, Valia Kalokyri, Giovanni Trappolini, ICCHP 2024.
-
Preserved action recognition in children with autism spectrum disorders: Evidence from an EEG and eye-tracking study - Mohammad Saber Sotoodeh, Hamidreza Taheri-Torbati, Nouchine Hadjikhani, Amandine Lassalle, Psychophysiology 2020.
-
A Computational Model of Early Word Learning from the Infant's Point of View - Satoshi Tsutsui, Arjun Chandrasekaran, Md Alimoor Reza, David Crandall, Chen Yu, CogSci 2020.
-
GSM - Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz, CVPR 2020. [code]
-
TSM - Ji Lin, Chuang Gan, Song Han, ICCV 2019.
-
TBN - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019. [code]
-
TRN - Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba, ECCV 2018.
-
R(2+1) - Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri, CVPR 2018.
-
TSN - Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool, ECCV 2016.
-
SlowFast - Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, ICCV 2019.
-
I3D - Joao Carreira, Andrew Zisserman, CVPR 2017.
-
RULSTM - Antonino Furnari, Giovanni Maria Farinella, ICCV 2019. [code]
-
LSTA - Sudhakaran, Swathikiran and Escalera, Sergio and Lanz, Oswald, CVPR 2019. [code]
-
Ego-STAN - Jinman Park, Kimathi Kaai, Saad Hossain, Norikatsu Sumi, Sirisha Rambhatla, Paul Fieguth, WCVPR 2022.
-
XViT - Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos, NIPS 2021.
-
TimeSformer - Gedas Bertasius, Heng Wang, Lorenzo Torresani, ICML 2021.
-
ViViT - Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid, ICCV 2021.
-
EgoGen: An Egocentric Synthetic Data Generator - Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang, CVPR 2024. [project page]
-
Active Object Detection with Knowledge Aggregation and Distillation from Large Models - Dejie Yang, Yang Liu, CVPR 2024. [code]
-
Self-Supervised Object Detection from Egocentric Videos - Peri Akiva, Jing Huang, Kevin J Liang, Rama Kovvuri, Xingyu Chen, Matt Feiszli, Kristin Dana, Tal Hassner, ICCV 2023.
-
COPILOT: Human-Environment Collision Prediction and Localization from Egocentric Videos - Boxiao Pan, Bokui Shen, Davis Rempe, Despoina Paschalidou, Kaichun Mo, Yanchao Yang, Leonidas J. Guibas, ICCV 2023. [project page]
-
Revisiting 3D Object Detection From an Egocentric Perspective - Boyang Deng, Charles R. Qi, Mahyar Najibi, Thomas Funkhouser, Yin Zhou, Dragomir Anguelov, NIPS 2021.
-
Learning by Watching - Jimuyang Zhang, Eshed Ohn-Bar, CVPR 2021.
-
[IndustReal] - The IndustReal dataset contains 84 videos, demonstrating how 27 participants perform maintenance and assembly procedures on a construction-toy assembly set. WACV 2024. [paper] [code]
-
IKEA Ego 3D Dataset - A novel dataset for ego-view 3D point cloud action recognition. The dataset consists of approximately 493k frames and 56 classes of intricate furniture assembly actions of four different furniture types. WACV 2024. [paper]
-
[EvIs-Kitchen] - The EvIs-Kitchen dataset is the first VIdeo-Sensor-Sensor (V-S-S) interaction-focused dataset for ego-HAR tasks, capturing sequences of everyday kitchen activities. This dataset uses two inertial sensors on both wrists to better capture subject-object interactions. IEEE Sensors Journal 2024. [paper]
-
Ego-Exo4D - Ego-Exo4D, a vast multimodal multiview video dataset capturing skilled human activities in both egocentric and exocentric perspectives (e.g., sports, music, dance). With 800+ participants in 13 cities, it offers 1,422 hours of combined footage, featuring diverse activities in 131 natural scene contexts, ranging from 1 to 42 minutes per video. CVPR 2024. [paper]
-
[EgoExoLearn] - EgoExoLearn, a large-scale dataset that emulates the human demonstration following process, in which individuals record egocentric videos as they execute tasks guided by demonstration videos. EgoExoLearn contains egocentric and demonstration video data spanning 120 hours captured in daily life scenarios and specialized laboratories. CVPR 2024. [paper] [code]
-
OAKINK2 - A dataset of bimanual object manipulation tasks for complex daily activities. OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 dataset provides multi-view image streams and precise pose annotations for the human body, hands and various interacting objects. This extensive collection supports applications such as interaction reconstruction and motion synthesis. CVPR 2024. [paper]
-
UnrealEgo2-UnrealEgo-RW - UnrealEgo2 Dataset: An expanded dataset capturing over 15,200 motions of realistic 3D human models with a glasses-based device, offering 1.25 million stereo views and comprehensive joint annotations. UnrealEgo-RW Dataset: A real-world dataset utilizing a compact mobile device with fisheye cameras, designed for versatile egocentric image capture in various environments. CVPR 2024. [paper] [code]
-
[TF2023] - A novel dataset featuring synchronized first-person and third-person views, including masks of camera wearers linked to their respective views. It consists of 208,794 training and 87,449 testing image pairs, with no actor overlap between sets. Each scene averages 4.29 actors, focusing on complex interactions like puzzle games, enhancing its value for cross-view matching in egocentric vision. CVPR 2024. [paper] [code]
-
TACO - A large-scale dataset of real-world bimanual tool-object interactions, featuring 131 tool-action-object triplets across 2.5K motion sequences and 5.2M frames with egocentric and 3rd-person views. TACO enables benchmarks in action recognition, hand-object motion forecasting, and grasp synthesis, advancing generalization research in human-object interactions. CVPR 2024. [paper]
-
[BioVL-QR] - A biochemical vision-and-language dataset, which consists of 24 egocentric experiment videos, corresponding protocols, and video-and-language alignments. This study focuses on Micro QR Codes to detect objects automatically. From our preliminary study, we found that detecting objects only using Micro QR Codes is still difficult because the researchers manipulate objects, causing blur and occlusion frequently. 2024. [paper]
-
HOI-Ref - It consists of 3.9M question-answer pairs for training and evaluating VLMs. HOI-QA includes questions relating to locating hands, objects, and critically their interactions (e.g. referring to the object being manipulated by the hand). 2024. [paper]
-
HOT3D - HOT3D is benchmark dataset for egocentric vision-based understanding of 3D hand-object interactions. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. 2024. [paper] [code]
-
[ADL4D] - ADL4D dataset offers a novel perspective on human-object interactions, providing video sequences of everyday activities involving multiple people and objects interacting simultaneously. 2024. [paper]
-
ENIGMA-51 - ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). The 51 egocentric video sequences are densely annotated with a rich set of labels that enable the systematic study of human behavior in the industrial domain. WACV 2023. [paper]
-
VidChapters-7M - VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner by scraping user-annotated chapters and hence without any additional manual annotation. NeurIPS 2023. [paper]
-
POV-Surgery - POV-Surgery, a large-scale, synthetic, egocentric dataset focusing on pose estimation for hands with different surgical gloves and three orthopedic surgical instruments, namely scalpel, friem, and diskplacer. Our dataset consists of 53 sequences and 88,329 frames, featuring high-resolution RGB-D video streams with activity annotations, accurate 3D and 2D annotations for hand-object pose, and 2D hand-object segmentation masks. MICCAI 2023. [paper]
-
CaptainCook4D - CaptainCook4D, comprising 384 recordings (94.5 hours) of people performing recipes in real kitchen environments. This dataset consists of two distinct types of activity: one in which participants adhere to the provided recipe instructions and another in which they deviate and induce errors. We provide 5.3K step annotations and 10K fine-grained action annotations and benchmark the dataset for the following tasks: supervised error recognition, multistep localization, and procedure learning. ICMLW 2023. [paper]
-
ARGO1M - Action Recognition Generalisation dataset (ARGO1M) from videos and narrations from Ego4D. ARGO1M is the first to test action generalisation across both scenario and location shifts, and is the largest domain generalisation dataset across images and video. ICCV 2023. [paper]
-
[EgoObjects] - EgoObjects, a large-scale egocentric dataset for fine-grained object understanding. contains over 9K videos collected by 250 participants from 50+ countries using 4 wearable devices, and over 650K object annotations from 368 object categories. ICCV 2023. [paper] [code]
-
HoloAssist - HoloAssist: a large-scale egocentric human interaction dataset that spans 166 hours of data captured by 350 unique instructor-performer pairs, wearing mixed-reality headsets during collaborative tasks. ICCV 2023. [paper]
-
AssemblyHands - AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities with challenging handobject interactions. CVPR 2023. [paper]
-
EpicSoundingObject - Epic Sounding Object dataset with sounding object annotations to benchmark the localization performance in egocentric videos. CVPR 2023. [paper] [code]
-
VOST - Video Object Segmentation under Transformations (VOST). It consists of more than 700 high-resolution videos, captured in diverse environments, which are 20 seconds long on average and densely labeled with instance masks. CVPR 2023. [paper]
-
ARCTIC - A dataset with 2.1 million video frames shows two hands skillfully manipulating objects. It includes precise 3D models of the hands and objects, as well as detailed, dynamic contact information. The dataset features two-handed actions with objects like scissors and laptops, capturing the changing hand positions and object states over time. CVPR 2023. [paper]
-
Aria Digital Twin - Aria Digital Twin (ADT) - an egocentric dataset captured using Aria glasses with extensive object, environment, and human level ground truth. This ADT release contains 200 sequences of real-world activities conducted by Aria wearers. Very challenging research problems such as 3D object detection and tracking, scene reconstruction and understanding, sim-to-real learning, human pose prediction - while also inspiring new machine perception tasks for augmented reality (AR) applications. 2023. [paper] [code]
-
WEAR - The dataset comprises data from 18 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 10 different outside locations. 2023. [paper]
-
EPIC Fields - EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on modelling problems. 2023. [paper]
-
[EGOFALLS] - The dataset comprises 10,948 video samples from 14 subjects, focusing on falls among the elderly. Extracting multimodal descriptors from egocentric camera videos. 2023. [paper]
-
[Exo2EgoDVC] - EgoYC2, a novel egocentric dataset, adapts procedural captions from YouCook2 to cooking videos re-recorded with head-mounted cameras. Unique in its weakly-paired approach, it aligns caption content with exocentric videos, distinguishing itself from other datasets focused on action labels. 2023. [paper]
-
EgoWholeBody - EgoWholeBody, a large synthetic dataset, comprising 840,000 high-quality egocentric images captured across a diverse range of whole-body motion sequences. Quantitative and qualitative evaluations demonstrate the effectiveness of our method in producing high-quality whole-body motion estimates from a single egocentric camera. 2023. [paper]
-
[IT3DEgo] - IT3DEgo dataset: Addresses 3D instance tracking using egocentric sensors (AR/VR). Recorded in diverse indoor scenes with HoloLens2, it comprises 50 recordings (5+ minutes each). Evaluates tracking performance in 3D coordinates, leveraging camera pose and allocentric representation. 2023. [paper] [code]
-
Touch and Go - we present a dataset, called Touch and Go, in which human data collectors walk through a variety of environments, probing objects with tactile sensors and simultaneously recording their actions on video. NeurIPS 2022. [paper] [code]
-
EPIC-Visor - VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. NeurIPS 2022. [paper]
-
AssistQ - A new dataset comprising 529 question-answer samples derived from 100 newly filmed first-person videos. Each question should be completed with multi-step guidances by inferring from visual details (e.g., buttons' position) and textural details (e.g., actions like press/turn). ECCV 2022. [paper]
-
EgoProceL - EgoProceL dataset focuses on the key-steps required to perform a task instead of every action in the video. EgoProceL consistis of 62 hours of videos captured by 130 subjects performing 16 tasks. ECCV 2022. [paper]
-
EgoHOS - EgoHOS, a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with during a diverse array of daily activities. Our dataset is the first to label detailed hand-object contact boundaries. ECCV 2022. [paper] [code]
-
UnrealEgo - UnrealEgo, i.e., a new large-scale naturalistic dataset for egocentric 3D human pose estimation. It is the first dataset to provide in-the-wild stereo images with the largest variety of motions among existing egocentric datasets. ECCV 2022. [paper]
-
Assembly101 - Procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 “take-apart” toy vehicles. CVPR 2022. [paper]
-
EgoPAT3D - A large multimodality dataset of more than 1 million frames of RGB-D and IMU streams, with evaluation metrics based on high-quality 2D and 3D labels from semi-automatic annotation. CVPR 2022. [paper]
-
AGD20K - Affordance dataset constructed by collecting and labeling over 20K images from 36 affordance categories. CVPR 2022. [paper]
-
HOI4D - A large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 4 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. CVPR 2022. [paper]
-
EgoPW - A dataset captured by a head-mounted fisheye camera and an auxiliary external camera, which provides an additional observation of the human body from a third-person perspective. CVPR 2022. [paper]
-
Ego4D - 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. CVPR 2022. [paper]
-
N-EPIC-Kitchens - N-EPIC-Kitchens, the first event-based camera extension of the large-scale EPIC-Kitchens dataset. CVPR 2022. [paper]
-
EasyCom-Clustering - The first large-scale egocentric video face clustering dataset. 2022. [paper]
-
First2Third-Pose - A new paired synchronized dataset of nearly 2,000 videos depicting human activities captured from both first- and third-view perspectives. 2022. [paper]
-
TREK-100 - Object tracking in first person vision. WICCV 2021. [paper]
-
[BioVL] - A novel biochemical video-andlanguage (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. 16 videos from four protocols with a total length of 1.6 hours. WICCV 2021. [paper]
-
MECCANO - 20 subject assembling a toy motorbike. WACV 2021. [paper]
-
EPIC-Kitchens 2020 - Subjects performing unscripted actions in their native environments. IJCV 2021. [paper]
-
H2O - H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds. ICCV 2021. [paper]
-
HOMAGE - Home Action Genome (HOMAGE): a multi-view action dataset with multiple modalities and view-points supplemented with hierarchical activity and atomic action labels together with dense scene composition labels. CVPR 2021. [paper]
-
EgoCom - A natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. TPAMI 2020. [paper]
-
EGO-CH - 70 subjects visiting two cultural sites in Sicily, Italy. Pattern Recognition Letters 2020. [paper]
-
EPIC-Tent - 29 participants assembling a tent while wearing two head-mounted cameras. ICCV 2019. [paper]
-
EPIC-Kitchens 2018 - 32 subjects performing unscripted actions in their native environments. ECCV 2018. [paper]
-
Charade-Ego - Paired first-third person videos.
-
EGTEA Gaze+ - 32 subjects, 86 cooking sessions, 28 hours.
-
ADL - 20 subjects performing daily activities in their native environments.
-
CMU kitchen - Multimodal, 18 subjects cooking 5 different recipes: brownies, eggs, pizza, salad, sandwich.
-
EgoSeg - Long term actions (walking, running, driving, etc.).
-
First-Person Social Interactions - 8 subjects at disneyworld.
-
UEC Dataset - Two choreographed datasets with different egoactions (walk, jump, climb, etc.) + 6 YouTube sports videos.
-
JPL - Interaction with a robot.
-
FPPA - Five subjects performing 5 daily actions.
-
UT Egocentric - 3-5 hours long videos capturing a person's day.
-
VINST/ Visual Diaries - 31 videos capturing the visual experience of a subject walking from metro station to work.
-
Bristol Egocentric Object Interaction (BEOID) - 8 subjects, six locations. Interaction with objects and environment.
-
Object Search Dataset - 57 sequences of 55 subjects on search and retrieval tasks.
-
UNICT-VEDI - Different subjects visiting a museum.
-
UNICT-VEDI-POI - Different subjects visiting a museum.
-
Simulated Egocentric Navigations - Simulated navigations of a virtual agent within a large building.
-
EgoCart - Egocentric images collected by a shopping cart in a retail store.
-
Unsupervised Segmentation of Daily Living Activities - Egocentric videos of daily activities.
-
Visual Market Basket Analysis - Egocentric images collected by a shopping cart in a retail store.
-
Location Based Segmentation of Egocentric Videos - Egocentric videos of daily activities.
-
Recognition of Personal Locations from Egocentric Videos - Egocentric videos clips of daily.
-
EgoGesture - 2k videos from 50 subjects performing 83 gestures.
-
EgoHands - 48 videos of interactions between two people.
-
DoMSEV - 80 hours/different activities.
-
DR(eye)VE - 74 videos of people driving.
-
THU-READ - 8 subjects performing 40 actions with a head-mounted RGBD camera.
-
EgoDexter - 4 sequences with 4 actors (2 female), and varying interactions with various objects and and cluttered background. [paper]
-
First-Person Hand Action (FPHA) - 3D hand-object interaction. Includes 1175 videos belonging to 45 different activity categories performed by 6 actors. [paper]
-
UTokyo Paired Ego-Video (PEV) - 1,226 pairs of first-person clips extracted from the ones recorded synchronously during dyadic conversations.
-
UTokyo Ego-Surf - Contains 8 diverse groups of first-person videos recorded synchronously during face-to-face conversations.
-
TEgO: Teachable Egocentric Objects Dataset - Contains egocentric images of 19 distinct objects taken by two people for training a teachable object recognizer.
-
Multimodal Focused Interaction Dataset - Contains 377 minutes of continuous multimodal recording captured during 19 sessions, with 17 conversational partners in 18 different indoor/outdoor locations.
-
Ego4D- Episodic Memory, Hand-Object Interactions, AV Diarization, Social, Forecasting.
-
Epic Kitchen Challenge- Action Recognition, Action Detection, Action Anticipation, Unsupervised Domain Adaptation for Action Recognition, Multi-Instance Retrieval.
-
MECCANO- Multimodal Action Recognition (RGB-Depth-Gaze).
-
[Xiaomi Smart Glasses]
-
[Alpha Glass]
This is a work in progress...