LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng
Published in arXiv:2402.16050, 2024
Intergrating optical flow for relevant content selection to improve video-text LLMs’ abilities on videoqa.
Download here