← Back to Benchmarks
simmediumvision-robotmetric · varies

Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints

Description

Motion-controllable video generation is crucial for egocentric applications in virtual reality and embodied AI. However, existing methods often struggle to achieve 3D-consistent fine-grained hand articulation. By adopting on 2D trajectories or implicit poses, they collapse 3D geometry into spatially ambiguous signals or over rely on human-centric priors. Under severe egocentric occlusions, this causes motion inconsistencies and hallucinated artifacts, as well as preventing cross-embodiment gener

Source

http://arxiv.org/abs/2603.11755v1