dataset

ZwZ-RL-VQA

inclusionAI

or hover any field below to flag it

Overview

Name
ZwZ-RL-VQA
Source
inclusionAI
Episodes
0
Robot count
0
Format
parquet
Description
ZwZ-RL-VQA: Region-to-Image Distilled Training Data for Fine-Grained Perception This dataset contains 74K high-quality VQA pairs generated via Region-to-Image Distillation (R2I) for training multimodal large language models (MLLMs) on fine-grained perception tasks without test-time tool use. 📖 Overview The Zooming without Zooming (ZwZ) method transforms "zooming" from an inference-time tool into a training-time primitive: Zoom-in Synthesis: Strong teacher models… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/ZwZ-RL-VQA.
Robots used
null

Links

HuggingFace dataset