dataset

Audio-Visual-Question-Answering-AVQA

zailongchen

or hover any field below to flag it

Overview

Name

Source

zailongchen

Episodes

Robot count

Format

other

Description

This task is based on MUSIC-AVQA Dataset. And we focus on optimize the accuracy of AVQA task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensive multimodal understanding and spatio-temporal reasoning over au

Robots used

null

Links

HuggingFace dataset

null