dataset

SSP

Quark-LLM

or hover any field below to flag it

Overview

Name
SSP
Source
Quark-LLM
Episodes
0
Robot count
0
Format
other
Description
Search Self-Play (SSP) Dataset Paper | arXiv | Code Search Self-Play (SSP) is a reinforcement learning framework designed for training adversarial self-play agents with integrated search capabilities—enabling both proposer and solver agents to conduct multi-turn search engine calling and reasoning in a coordinated manner. Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly… See the full description on the dataset page: https://huggingface.co/datasets/Quark-LLM/SSP.
Robots used
null

Links

HuggingFace dataset