simmediummanipulation-datametric · varies

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

Description

Embodied Chain-of-Thought (CoT) reasoning has significantly enhanced Vision-Language-Action (VLA) models, yet current methods rely on rigid templates to specify reasoning primitives (e.g., objects in the scene, high-level plans, structural affordances). These templates can force policies to process irrelevant information that distracts from critical action-prediction signals. This creates a bottleneck: without successful policies, we cannot verify reasoning quality; without quality reasoning, we

Source

http://arxiv.org/abs/2602.08167v1