dataset
DDRBench_10K_trajectory
thinkwee
or hover any field below to flag it
Overview
Name
DDRBench_10K_trajectory
Source
thinkwee
Episodes
0
Robot count
0
Format
json
Description
10K Agent Trajectories Dataset
Project Page | Paper | Code
Overview
This dataset contains agent trajectories from the Deep Data Research (DDR) project's 10-K financial analysis task, as presented in the paper "Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models".
DDR-Bench is a large-scale benchmark designed to evaluate "investigatory intelligence" in LLM agents—the autonomy to set goals and explore raw data without explicit queries. This… See the full description on the dataset page: https://huggingface.co/datasets/thinkwee/DDRBench_10K_trajectory.
Robots used
null
Links
HuggingFace dataset