← Back to Benchmarks
simmediumatarimetric · varies

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Description

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the pract

Source

http://arxiv.org/abs/2108.13264v4