← Back to Benchmarks
simmediumroboticsmetric · varies

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

Description

We present the first systematic analysis of multimodal large language models (MLLMs) in personalized question-answering requiring ego-grounding - the ability to understand the camera-wearer in egocentric videos. To this end, we introduce MyEgo, the first egocentric VideoQA dataset designed to evaluate MLLMs' ability to understand, remember, and reason about the camera wearer. MyEgo comprises 541 long videos and 5K personalized questions asking about "my things", "my activities", and "my past". B

Source

http://arxiv.org/abs/2604.01966v1