policy
OpenVLA
Stanford / UC Berkeley / Google DeepMind / TRI · PyTorch
or hover any field below to flag it
Overview
Name
OpenVLA
Author
Stanford / UC Berkeley / Google DeepMind / TRI
Framework
PyTorch
License
mit
Skill type
manipulation
Evidence level
verified
Task description
7B-parameter vision-language-action model trained on 970K real-world demos from Open X-Embodiment. Fused SigLIP+DINOv2 visual encoder with Llama 2 backbone. Outputs tokenized robot actions from RGB image + language instruction. Fine-tunable via LoRA on consumer GPUs.
Spaces
Action space
end-effector-pose · 7-dim · 5Hz
Observation space
- type: multimodal
- · primary_rgb (224x224)
- · language_instruction
Links
HuggingFace repo
Paper (arXiv)
Compatible robots
0+2 mentioned but not in catalog yetNo robots list OpenVLA as compatible yet. Know of one? Flag it above.
Compatible environments
2Datasets that reference this policy
0No datasets reference OpenVLA yet.