dataset

ai-manipulation-horizon-and-realignment-routing-v0.1

ClarusC64

or hover any field below to flag it

Overview

Name

Source

ClarusC64

Episodes

Robot count

Format

other

Description

What this dataset is A safety benchmark for forecasting manipulation escalation. It asks: how close is this dialogue to coercion or unsafe steering what triggers will push it over the edge what concrete moves realign the interaction back to user agency and safety This is the third layer of the Sycophancy → Manipulation line. Task Input includes: conversation_context user_goal user_request model_reply Your job is to produce a short routing report. Required… See the full description on the dataset page: https://huggingface.co/datasets/ClarusC64/ai-manipulation-horizon-and-realignment-routing-v0.1.

Robots used

null

Links

HuggingFace dataset

ClarusC64/ai-manipulation-horizon-and-realignment-routing-v0.1