policy
RVT-2
NVIDIA · PyTorch
or hover any field below to flag it
Overview
Name
RVT-2
Author
NVIDIA
Framework
PyTorch
License
other
Skill type
manipulation
Evidence level
reported
Task description
Multi-view transformer with coarse-to-fine rendering for precise 3D manipulation. Re-renders camera inputs from virtual views, uses cross-view attention for keyframe action prediction, then zooms in for millimeter-level gripper pose accuracy. 81.4% on RLBench 18 tasks, trains 36x faster than PerAct.
Spaces
Action space
end-effector-pose · 7-dim · 1Hz
Observation space
- type: multimodal
- · multi_view_rgbd (re-rendered virtual views)
- · language_instruction
Links
HuggingFace repo
null
Paper (arXiv)
Compatible robots
0+1 mentioned but not in catalog yetNo robots list RVT-2 as compatible yet. Know of one? Flag it above.
Compatible environments
2Datasets that reference this policy
0No datasets reference RVT-2 yet.