dataset
Multimodal-Reinforce-CoT
Mr-Wonderfool
or hover any field below to flag it
Overview
Name
Multimodal-Reinforce-CoT
Source
Mr-Wonderfool
Episodes
0
Robot count
0
Format
other
Description
Fine-tuning Qwen2.5-VL-3B-Instruct to output high quality chain-of-thoughts on GQA dataset with reinforcement learning
Robots used
null
Links
HuggingFace dataset
null