← Back to Benchmarks
simmediumimitationmetric · varies
ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation
Description
Dexterous manipulation is a cornerstone capability for robotic systems aiming to interact with the physical world in a human-like manner. Although vision-based methods have advanced rapidly, tactile sensing remains crucial for fine-grained control, particularly in unstructured or visually occluded settings. We present ViTacFormer, a representation-learning approach that couples a cross-attention encoder to fuse high-resolution vision and touch with an autoregressive tactile prediction head that