← Back to Benchmarks
simmediumquadrupedmetric · varies
Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots
Description
Today's best-explored routes towards generalist robots center on collecting ever larger "observations-in actions-out" robotics datasets to train large end-to-end models, copying a recipe that has worked for vision-language models (VLMs). We pursue a road less traveled: building generalist policies directly around VLMs by augmenting their general capabilities with specific robot capabilities encapsulated in a carefully curated set of perception, planning, and control modules. In Maestro, a VLM co