simmediumimitationmetric · varies

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

Description

Vision-Language-Action (VLA) models such as OpenVLA, Octo, and $π_0$ have shown strong generalization by leveraging large-scale demonstrations, yet their performance is still fundamentally constrained by the quality and coverage of supervised data. Reinforcement learning (RL) provides a promising path for improving and fine-tuning VLAs through online interaction. However, conventional policy gradient methods are computationally infeasible in the context of flow-matching based models due to the i

Source

http://arxiv.org/abs/2510.09976v1