dataset

Multimodal-Reinforce-CoT

Mr-Wonderfool

or hover any field below to flag it

Overview

Name
Multimodal-Reinforce-CoT
Source
Mr-Wonderfool
Episodes
0
Robot count
0
Format
other
Description
Fine-tuning Qwen2.5-VL-3B-Instruct to output high quality chain-of-thoughts on GQA dataset with reinforcement learning
Robots used
null

Links

HuggingFace dataset
null