HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao
Published in arXiv, 2025
One of the first video-text LLMs that can perform temporal video grounding in a fully text-to-text manner, and InternVid-G, a large-scale video-text dataset for video grounding training.
Download here