simmediumatarimetric · varies

Analysing Results from AI Benchmarks: Key Indicators and How to Obtain Them

Description

Item response theory (IRT) can be applied to the analysis of the evaluation of results from AI benchmarks. The two-parameter IRT model provides two indicators (difficulty and discrimination) on the side of the item (or AI problem) while only one indicator (ability) on the side of the respondent (or AI agent). In this paper we analyse how to make this set of indicators dual, by adding a fourth indicator, generality, on the side of the respondent. Generality is meant to be dual to discrimination,

Source

http://arxiv.org/abs/1811.08186v2