simmediumgraspingmetric · varies

Zero-Shot Temporal Interaction Localization for Egocentric Videos

Description

Locating human-object interaction (HOI) actions within video serves as the foundation for multiple downstream tasks, such as human behavior analysis and human-robot skill transfer. Current temporal action localization methods typically rely on annotated action and object categories of interactions for optimization, which leads to domain bias and low deployment efficiency. Although some recent works have achieved zero-shot temporal action localization (ZS-TAL) with large vision-language models (V

Source

http://arxiv.org/abs/2506.03662v4