Abstract: Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis. Recent approaches treat MR and HD as similar video ...
Abstract: Diffusion models have made tremendous progress in text-driven image and video generation. Now text-to-image foundation models are widely applied to various down-stream image synthesis tasks, ...