policy

CRISP

sahilagr123 · PyTorch

or hover any field below to flag it

Overview

Name

CRISP

Author

sahilagr123

Framework

PyTorch

License

unknown

Skill type

other

Evidence level

untested

Task description

Building a multi-agent RL system where two LLMs learn mathematical reasoning through peer discussion, guided by a coach model. Implemented disagreement detection, GRPO-based distributed training, and verifiable reward pipelines using PyTorch on cloud GPUs (H100s).

Spaces

Action space

other · 0-dim · 0Hz

Observation space

type: other

Links

HuggingFace repo

null

Paper (arXiv)

null

Compatible environments

No environments list CRISP yet.

Datasets that reference this policy

No datasets reference CRISP yet.

Overview

Spaces

Links

Compatible robots

Compatible environments

Datasets that reference this policy