← Back to Benchmarks
simmediumatarimetric · varies
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games
Description
Recent advancements in large language models (LLMs) have expanded their capabilities beyond traditional text-based tasks to multimodal domains, integrating visual, auditory, and textual data. While multimodal LLMs have been extensively explored for high-level planning in domains like robotics and games, their potential as low-level controllers remains largely untapped. In this paper, we introduce a novel benchmark aimed at testing the emergent capabilities of multimodal LLMs as low-level policie