← Back to Benchmarks
simmediumgraspingmetric · varies

ViTA-Seg: Vision Transformer for Amodal Segmentation in Robotics

Description

Occlusions in robotic bin picking compromise accurate and reliable grasp planning. We present ViTA-Seg, a class-agnostic Vision Transformer framework for real-time amodal segmentation that leverages global attention to recover complete object masks, including hidden regions. We proposte two architectures: a) Single-Head for amodal mask prediction; b) Dual-Head for amodal and occluded mask prediction. We also introduce ViTA-SimData, a photo-realistic synthetic dataset tailored to industrial bin-p

Source

http://arxiv.org/abs/2512.09510v1