dataset

ZwZ-RL-VQA

inclusionAI

or hover any field below to flag it

Overview

Name

ZwZ-RL-VQA

Source

inclusionAI

Episodes

Robot count

Format

parquet

Description

ZwZ-RL-VQA: Region-to-Image Distilled Training Data for Fine-Grained Perception This dataset contains 74K high-quality VQA pairs generated via Region-to-Image Distillation (R2I) for training multimodal large language models (MLLMs) on fine-grained perception tasks without test-time tool use. 📖 Overview The Zooming without Zooming (ZwZ) method transforms "zooming" from an inference-time tool into a training-time primitive: Zoom-in Synthesis: Strong teacher models… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/ZwZ-RL-VQA.

Robots used

null

Links

HuggingFace dataset

inclusionAI/ZwZ-RL-VQA