dataset
Audio-Visual-Question-Answering-AVQA
zailongchen
or hover any field below to flag it
Overview
Name
Audio-Visual-Question-Answering-AVQA
Source
zailongchen
Episodes
0
Robot count
0
Format
other
Description
This task is based on MUSIC-AVQA Dataset. And we focus on optimize the accuracy of AVQA task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensive multimodal understanding and spatio-temporal reasoning over au
Robots used
null
Links
HuggingFace dataset
null