policy

RVT-2

NVIDIA · PyTorch

or hover any field below to flag it

Overview

Name
RVT-2
Author
NVIDIA
Framework
PyTorch
License
other
Skill type
manipulation
Evidence level
reported
Task description
Multi-view transformer with coarse-to-fine rendering for precise 3D manipulation. Re-renders camera inputs from virtual views, uses cross-view attention for keyframe action prediction, then zooms in for millimeter-level gripper pose accuracy. 81.4% on RLBench 18 tasks, trains 36x faster than PerAct.

Spaces

Action space
end-effector-pose · 7-dim · 1Hz
Observation space
  • type: multimodal
  • · multi_view_rgbd (re-rendered virtual views)
  • · language_instruction

Links

HuggingFace repo
null

Compatible robots

0+1 mentioned but not in catalog yet

No robots list RVT-2 as compatible yet. Know of one? Flag it above.

Compatible environments

2

Datasets that reference this policy

0

No datasets reference RVT-2 yet.