dataset
SSP
Quark-LLM
or hover any field below to flag it
Overview
Name
SSP
Source
Quark-LLM
Episodes
0
Robot count
0
Format
other
Description
Search Self-Play (SSP) Dataset
Paper | arXiv | Code
Search Self-Play (SSP) is a reinforcement learning framework designed for training adversarial self-play agents with integrated search capabilities—enabling both proposer and solver agents to conduct multi-turn search engine calling and reasoning in a coordinated manner.
Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly… See the full description on the dataset page: https://huggingface.co/datasets/Quark-LLM/SSP.
Robots used
null
Links
HuggingFace dataset