simmediumdexterousmetric · varies

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Description

Humans practice slow thinking before performing actual actions when handling complex tasks in the physical world. This thinking paradigm, recently, has achieved remarkable advancement in boosting Large Language Models (LLMs) to solve complex tasks in digital domains. However, the potential of slow thinking remains largely unexplored for robotic foundation models interacting with the physical world. In this work, we propose Hume: a dual-system Vision-Language-Action (VLA) model with value-guided

Source

http://arxiv.org/abs/2505.21432v4