VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Yan-Bo Lin, Yu Tian, Linjie Yang, Gedas Bertasius, Heng Wang
WACV 2025
[arxiv] [project page] [code] [bibtex] ​​​​​​​​​​​​​​​​​​​​​​​​
DAM: Dynamic Adapter Merging for Continual Video QA Learning
Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius
WACV 2025
[arxiv] [code] [bibtex] ​​​​​​​​​​​​​​​​​​​​​
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang*, Taixi Lu*, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius
EMNLP 2024
[arxiv] [code] [bibtex] ​​​​​​​​​​​​​​​​​​​​​
Augmented Reality Demonstrations for Scalable Robot Imitation Learning
Yue Yang, Bryce Ikeda, Gedas Bertasius, Daniel Szafir
IROS 2024
[arxiv] [bibtex] ​​​​​​​​​​​​​​​​​​​​​
Xiyao Wang, Yuhang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang
ACL 2024
[arxiv] [dataset] [bibtex] ​​​​​​​​​​​​
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Fu-Jen Chu, Kris Kitani, Gedas Bertasius, Xitong Yang
ECCV 2024 (Oral)
[arxiv] [project page] [bibtex]
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin, Gedas Bertasius
ECCV 2024
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng*, Mi Luo*, Huiyu Wang, Alex Dimakis, Lorenzo Torresani, Gedas Bertasius, Kristen Grauman
ECCV 2024
[arxiv] [bibtex]
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Thomas Seidl, Gedas Bertasius
ECCV 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius
CVPR 2024
[arxiv] [project website] [code] [dataset] [bibtex]
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Gedas Bertasius, ... , Michael Wray
CVPR 2024 (Oral)
[arxiv] [project website] [blog] [video] [bibtex]
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang, Feng Cheng, Gedas Bertasius, David Crandall
CVPR 2024
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal
ICCV 2023
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
Qin Liu, Zhenlin Xu, Gedas Bertasius, Marc Niethammer
ICCV 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius
CVPR 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius
CVPR 2023
[arxiv] [code] [project page] [bibtex]
Efficient Movie Scene Detection using State-Space Transformers
Md Mohaiminul Islam, Mahmudul Hasan, Kishan Athrey, Tony Braskich, Gedas Bertasius
CVPR 2023
Improving Video Retrieval Using Multilingual Knowledge Transfer
Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal
ECIR 2023 (Best Student Paper Award)
[arxiv]
Learning to Retrieve Videos by Asking Questions
Avinash Madasu, Junier Oliva, Gedas Bertasius
ACM Multimedia 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
ECCV 2022 (Oral)
[arxiv] [code] [project page] [bibtex]
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng, Gedas Bertasius
ECCV 2022
Long Movie Clip Classification with State-Space Video Models
Md Mohaiminul Islam, Gedas Bertasius
ECCV 2022
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani
CVPR 2022
[arxiv] [code] [project page] [bibtex]
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani
CVPR 2022
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius, Heng Wang, Lorenzo Torresani
ICML 2021 (Top-5 Most Impactful ICML 2021 Paper)
[arxiv] [code] [talk] [slides] [Facebook AI Blog] [VentureBeat] [SiliconAngle] [bibtex]
Vx2Text: End-to-End Learning of Video-Based Text Generation from Multimodal Inputs
Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani
CVPR 2021
[arxiv] [VentureBeat] [bibtex]
Supervoxel Attention Graphs for Long-Range Video Modeling
Yang Wang, Gedas Bertasius, Tae-Hyun Oh, Abhinav Gupta, Minh Hoai, Lorenzo Torresani
WACV 2021
COBE: Contextualized Object Embeddings from Narrated Instructional Video
Gedas Bertasius, Lorenzo Torresani
NeurIPS 2020
[arxiv] [talk] [slides] [HowTo100M_BB pseudo annotations] [bibtex]
Attentive Action and Context Factorization
Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai
​BMVC 2020
[arxiv]
Classifying, Segmenting, and Tracking Objects in Video with Mask Propagation
Gedas Bertasius, Lorenzo Torresani
​CVPR 2020 (Best Paper Nominee)
Ranked 1st on YouTube-VIS Leaderboard and EPIC-Kitchens Detection Challenge.
[arxiv] [talk] [slides] [bibtex]
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani
NeurIPS 2019
Ranked 1st on PoseTrack Leaderboard for multi-frame pose estimation.
​[arxiv] [poster] [code] [bibtex]
Object Detection in Video with Spatiotemporal Sampling Networks
Gedas Bertasius, Lorenzo Torresani and Jianbo Shi
​ECCV 2018
[arxiv] [results] [bibtex]
Egocentric Basketball Motion Planning from a Single First-Person Image
Gedas Bertasius, Aaron Chan and Jianbo Shi
CVPR 2018
[arxiv] [results] [MIT SSAC Poster] ​[bibtex]
Am I a Baller? Basketball Performance Assessment from First-Person Videos
Gedas Bertasius, Stella X. Yu, Hyun Soo Park and Jianbo Shi
​ICCV 2017
[​arxiv] [results] [bibtex]
Unsupervised Learning of Important Objects from First-Person Videos
Gedas Bertasius, Hyun Soo Park, Stella X. Yu and Jianbo Shi
​ICCV 2017
[arxiv] [bibtex]
Convolutional Random Walk Networks for Semantic Image Segmentation
Gedas Bertasius, Lorenzo Torresani, Stella X. Yu and Jianbo Shi
​CVPR 2017
[arxiv]​​ [bibtex]
First-Person Action-Object Detection with EgoNet
Gedas Bertasius, Hyun Soo Park, Stella X. Yu, and Jianbo Shi
​RSS 2017
[arxiv] [New Scientist Article] [Impact Article] [results] ​[bibtex]
Local Perturb-and-MAP for Structured Prediction
Gedas Bertasius, Qiang Liu, Lorenzo Torresani, and Jianbo Shi
​AISTATS 2017
[arxiv] ​​[bibtex]
Semantic Segmentation with Boundary Neural Fields
Gedas Bertasius, Jianbo Shi and Lorenzo Torresani
CVPR 2016
[arxiv] [code] [bibtex]
High-for-Low, Low-for-High: Efficient Boundary Detection from Deep Object Features and its Applications to High-Level Vision
Gedas Bertasius, Jianbo Shi, and Lorenzo Torresani
ICCV 2015
[arxiv] [code] [bibtex]
DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
Gedas Bertasius, Jianbo Shi, and Lorenzo Torresani
CVPR 2015
[arxiv] [bibtex]