policy

VoxPoser

Stanford · PyTorch

or hover any field below to flag it

Overview

Name

VoxPoser

Author

Stanford

Framework

PyTorch

License

mit

Skill type

manipulation

Evidence level

verified

Task description

Composes 3D voxel value maps for robot manipulation using LLM-generated code. An LLM writes Python code that interacts with open-vocabulary vision models to create 3D affordance/constraint maps. MPC planner synthesizes closed-loop 6-DoF trajectories from the maps. Zero-shot generalization to novel tasks and objects.

Spaces

Action space

end-effector-pose · 6-dim · 5Hz

Observation space

type: multimodal
· rgbd_images
· point_cloud
· language_instruction
· open_vocab_detections

Links

HuggingFace repo

null

Paper (arXiv)

https://arxiv.org/abs/2307.05973

Compatible robots

0+1 mentioned but not in catalog yet

No robots list VoxPoser as compatible yet. Know of one? Flag it above.

Compatible environments

tabletop-cleannot in seed tabletop-clutterednot in seed

Datasets that reference this policy

No datasets reference VoxPoser yet.