policy

OpenVLA

Stanford / UC Berkeley / Google DeepMind / TRI · PyTorch

or hover any field below to flag it

Overview

Name

OpenVLA

Author

Stanford / UC Berkeley / Google DeepMind / TRI

Framework

PyTorch

License

mit

Skill type

manipulation

Evidence level

verified

Task description

7B-parameter vision-language-action model trained on 970K real-world demos from Open X-Embodiment. Fused SigLIP+DINOv2 visual encoder with Llama 2 backbone. Outputs tokenized robot actions from RGB image + language instruction. Fine-tunable via LoRA on consumer GPUs.

Spaces

Action space

end-effector-pose · 7-dim · 5Hz

Observation space

type: multimodal
· primary_rgb (224x224)
· language_instruction

Links

HuggingFace repo

openvla/openvla-7b

Paper (arXiv)

https://arxiv.org/abs/2406.09246

Compatible robots

0+2 mentioned but not in catalog yet

No robots list OpenVLA as compatible yet. Know of one? Flag it above.

Compatible environments

tabletop-cleannot in seed tabletop-clutterednot in seed

Datasets that reference this policy

No datasets reference OpenVLA yet.