dataset

Multimodal-Reinforce-CoT

Mr-Wonderfool

or hover any field below to flag it

Overview

Name

Source

Mr-Wonderfool

Episodes

Robot count

Format

other

Description

Fine-tuning Qwen2.5-VL-3B-Instruct to output high quality chain-of-thoughts on GQA dataset with reinforcement learning

Robots used

null

HuggingFace dataset

null