simmediumvision-robotmetric · varies

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Description

Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enhancing VLA learning. Therefore, we propose \textit{GigaBrain-0.5M*}, a VLA model trained via world m

Source

http://arxiv.org/abs/2602.12099v2