policy

RVT-2

NVIDIA · PyTorch

or hover any field below to flag it

Overview

Name

RVT-2

Author

NVIDIA

Framework

PyTorch

License

other

Skill type

manipulation

Evidence level

reported

Task description

Multi-view transformer with coarse-to-fine rendering for precise 3D manipulation. Re-renders camera inputs from virtual views, uses cross-view attention for keyframe action prediction, then zooms in for millimeter-level gripper pose accuracy. 81.4% on RLBench 18 tasks, trains 36x faster than PerAct.

Spaces

Action space

end-effector-pose · 7-dim · 1Hz

Observation space

type: multimodal
· multi_view_rgbd (re-rendered virtual views)
· language_instruction

Links

HuggingFace repo

null

Paper (arXiv)

https://arxiv.org/abs/2406.08545

Compatible robots

0+1 mentioned but not in catalog yet

No robots list RVT-2 as compatible yet. Know of one? Flag it above.

Compatible environments

tabletop-cleannot in seed tabletop-clutterednot in seed

Datasets that reference this policy

No datasets reference RVT-2 yet.