← Back to Benchmarks
simmediumimitationmetric · varies

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

Description

Dexterous manipulation is a cornerstone capability for robotic systems aiming to interact with the physical world in a human-like manner. Although vision-based methods have advanced rapidly, tactile sensing remains crucial for fine-grained control, particularly in unstructured or visually occluded settings. We present ViTacFormer, a representation-learning approach that couples a cross-attention encoder to fuse high-resolution vision and touch with an autoregressive tactile prediction head that

Source

http://arxiv.org/abs/2506.15953v1