← Back to Benchmarks
simmediumsim-to-realmetric · varies
Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds
Description
The strong performance of large vision-language models (VLMs) trained with reinforcement learning (RL) has motivated similar approaches for fine-tuning vision-language-action (VLA) models in robotics. Many recent works fine-tune VLAs directly in the real world to avoid addressing the sim-to-real gap. While real-world RL circumvents sim-to-real issues, it inherently limits the generality of the resulting VLA, as scaling scene and object diversity in the physical world is prohibitively difficult.