← Back to Benchmarks
simmediumhumanoidmetric · varies

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Description

Robotic generalization relies on physical intelligence: the ability to reason about state changes, contact-rich interactions, and long-horizon planning under egocentric perception and action. Vision Language Models (VLMs) are essential to Vision-Language-Action (VLA) systems, but the reliance on third-person training data creates a viewpoint gap for humanoid robots. Collecting massive robot-centric data is an ideal but impractical solution due to cost and diversity constraints. Conversely, human

Source

http://arxiv.org/abs/2512.16793v2