← Back to Benchmarks
simmediummobile-manipulationmetric · varies

Mind to Hand: Purposeful Robotic Control via Embodied Reasoning

Description

Humans act with context and intention, with reasoning playing a central role. While internet-scale data has enabled broad reasoning capabilities in AI systems, grounding these abilities in physical action remains a major challenge. We introduce Lumo-1, a generalist vision-language-action (VLA) model that unifies robot reasoning ("mind") with robot action ("hand"). Our approach builds upon the general multi-modal reasoning capabilities of pre-trained vision-language models (VLMs), progressively e

Source

http://arxiv.org/abs/2512.08580v2