simmediumatarimetric · varies

Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games

Description

Recent advancements in large language models (LLMs) have expanded their capabilities beyond traditional text-based tasks to multimodal domains, integrating visual, auditory, and textual data. While multimodal LLMs have been extensively explored for high-level planning in domains like robotics and games, their potential as low-level controllers remains largely untapped. In this paper, we introduce a novel benchmark aimed at testing the emergent capabilities of multimodal LLMs as low-level policie

Source

http://arxiv.org/abs/2408.15950v2