Abstract: Generalist models, capable of handling multiple modalities and tasks simultaneously, are currently one of the hottest research topics. However, due to interference between different tasks ...
Abstract: Video-based Human Activity Recognition (VHAR) is a core task in computer vision with a wide range of applications in healthcare, surveillance, and human–robot interaction. Traditional VHAR ...