dataset
VinciCoder-1.6M-SFT
DocTron-Hub
or hover any field below to flag it
Overview
Name
VinciCoder-1.6M-SFT
Source
DocTron-Hub
Episodes
0
Robot count
0
Format
parquet
Description
VinciCoder: Unified Multimodal Code Generation Dataset
This repository contains the datasets used for VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning, a project that introduces a unified multimodal code generation model. The framework uses a two-stage training approach, comprising a large-scale Supervised Finetuning (SFT) corpus and a Visual Reinforcement Learning (ViRL) dataset. These datasets are designed for tasks involving direct… See the full description on the dataset page: https://huggingface.co/datasets/DocTron-Hub/VinciCoder-1.6M-SFT.
Robots used
null
Links
HuggingFace dataset