dataset
ZwZ-RL-VQA
inclusionAI
or hover any field below to flag it
Overview
Name
ZwZ-RL-VQA
Source
inclusionAI
Episodes
0
Robot count
0
Format
parquet
Description
ZwZ-RL-VQA: Region-to-Image Distilled Training Data for Fine-Grained Perception
This dataset contains 74K high-quality VQA pairs generated via Region-to-Image Distillation (R2I) for training multimodal large language models (MLLMs) on fine-grained perception tasks without test-time tool use.
📖 Overview
The Zooming without Zooming (ZwZ) method transforms "zooming" from an inference-time tool into a training-time primitive:
Zoom-in Synthesis: Strong teacher models… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/ZwZ-RL-VQA.
Robots used
null
Links
HuggingFace dataset