simmediumvision-robotmetric · varies

GazeMoE: Perception of Gaze Target with Mixture-of-Experts

Description

Estimating human gaze target from visible images is a critical task for robots to understand human attention, yet the development of generalizable neural architectures and training paradigms remains challenging. While recent advances in pre-trained vision foundation models offer promising avenues for locating gaze targets, the integration of multi-modal cues -- including eyes, head poses, gestures, and contextual features -- demands adaptive and efficient decoding mechanisms. Inspired by Mixture

Source

http://arxiv.org/abs/2603.06256v1