dataset

SSP

Quark-LLM

or hover any field below to flag it

Overview

Name

SSP

Source

Quark-LLM

Episodes

Robot count

Format

other

Description

Search Self-Play (SSP) Dataset Paper | arXiv | Code Search Self-Play (SSP) is a reinforcement learning framework designed for training adversarial self-play agents with integrated search capabilities—enabling both proposer and solver agents to conduct multi-turn search engine calling and reasoning in a coordinated manner. Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly… See the full description on the dataset page: https://huggingface.co/datasets/Quark-LLM/SSP.

Robots used

null

Links

HuggingFace dataset

Quark-LLM/SSP