

Gedas Bertasius
Assistant Professor
I am an Assistant Professor in the Computer Science department at the University of North Carolina, Chapel Hill. Before joining UNC, I was a postdoctoral researcher at Facebook AI Research (FAIR) working with Lorenzo Torresani. I finished my Ph.D. at the University of Pennsylvania, advised by Jianbo Shi, and my undergraduate degree at Dartmouth College.
Research
I lead the Multimodal Video Perception (MVP) group at UNC. We develop foundational models for multimodal video understanding, enabling machines to comprehend, reason about, and interact with complex video, audio, and language data. Moving beyond perception, we ask: what spatiotemporal abstractions are needed for AI to truly grasp complex human behaviors over long horizons? Representative projects include TimeSformer, Video ReCap, LLoVi, BIMBA, VideoTree.
Video Recognition

Developing spatiotemporal models for automatic video analysis (e.g., TimeSformer, ViS4mer).
Multimodal AI
Building models that can learn from video, audio, and text (e.g., Video ReCap, LLoVi, BIMBA).

Perceptual AI Coaches
Sports & AI

Generative Video Modeling


Translating visual inputs into effective real-world actions (e.g., WatchAct, BOSS, ReBot, and ARCADE)
Selected Projects
SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence
Yulu Pan, Han Yi, Seongsu Ha, Md Mohaiminul Islam, Benjamin Zhang, Lorenzo Torresani, Gedas Bertasius
ECCV 2026
[arxiv] [video] [project page] [extended paper] [code] [data] [bibtex]
WatchAct: A Benchmark for Behavior-Grounded Robot Manipulation
Baiqi Li, Ce Zhang, Yu Fang, Yue Yang, Shangzhe Li, Mingyu Ding, Gedas Bertasius
arXiv 2026
[arxiv] [project page] [code] [data] [bibtex]
SiLVR: A Simple Language-based Video Reasoning Framework
Ce Zhang, Yan-Bo Lin, Ziyang Wang, Mohit Bansal, Gedas Bertasius
TMLR 2026 (1st Place Winner at CVPR 2025 MMLU Challenge)
[arxiv] [project page] [code] [bibtex]
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Gedas Bertasius, ... , Michael Wray
CVPR 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius
CVPR 2024 (Egocentric Vision Distinguished Paper Award)
[arxiv] [project website] [code] [dataset] [bibtex]
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius, Heng Wang, Lorenzo Torresani
ICML 2021 (Top-5 Most Cited ICML 2021 Paper)
[arxiv] [code] [talk] [slides] [blog] [VentureBeat] [SiliconAngle] [bibtex]
Sponsors
We are grateful for the following agencies for supporting our research.
















