← Back to Benchmarks
simmediumroboticsmetric · varies
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation
Description
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge" requires grounding semantic references, spatial relations, and metric constraints within a 3D scene. While recent vision language models (VLMs) demonstrate strong semantic grounding capabilities, they are not explicitly designed to reason about metric constraints in physically defined spaces. In th