dataset

Audio-Visual-Question-Answering-AVQA

zailongchen

or hover any field below to flag it

Overview

Name
Audio-Visual-Question-Answering-AVQA
Source
zailongchen
Episodes
0
Robot count
0
Format
other
Description
This task is based on MUSIC-AVQA Dataset. And we focus on optimize the accuracy of AVQA task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensive multimodal understanding and spatio-temporal reasoning over au
Robots used
null

Links

HuggingFace dataset
null