← Back to Benchmarks
simmediumroboticsmetric · varies
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
Description
Building generalist embodied agents requires integrating perception, language understanding, and action, which are core capabilities addressed by Vision-Language-Action (VLA) approaches based on multimodal foundation models, including recent advances in vision-language models and world models. Despite rapid progress, VLA methods remain fragmented across incompatible architectures, codebases, and evaluation protocols, hindering principled comparison and reproducibility. We present StarVLA, an ope