← Back to Benchmarks
simmediummanipulationmetric · varies

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models

Description

Vision-Language-Action (VLA) models benefit from chain-of-thought (CoT) reasoning, but existing approaches incur high inference overhead and rely on discrete reasoning representations that mismatch continuous perception and control. We propose Latent Reasoning VLA (\textbf{LaRA-VLA}), a unified VLA framework that internalizes multi-modal CoT reasoning into continuous latent representations for embodied action. LaRA-VLA performs unified reasoning and prediction in latent space, eliminating explic

Source

http://arxiv.org/abs/2602.01166v1