← Back to Benchmarks
simmediumroboticsmetric · varies

Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation

Description

Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th

Source

http://arxiv.org/abs/2603.19166v1