← Back to Benchmarks
simmediummanipulationmetric · varies

Reshaping Action Error Distributions for Reliable Vision-Language-Action Models

Description

In robotic manipulation, vision-language-action (VLA) models have emerged as a promising paradigm for learning generalizable and scalable robot policies. Most existing VLA frameworks rely on standard supervised objectives, typically cross-entropy for discrete actions and mean squared error (MSE) for continuous action regression, which impose strong pointwise constraints on individual predictions. In this work, we focus on continuous-action VLA models and move beyond conventional MSE-based regres

Source

http://arxiv.org/abs/2602.04228v1