← Back to Benchmarks
simmediumatarimetric · varies

Fourier Head: Helping Large Language Models Learn Complex Probability Distributions

Description

As the quality of large language models has improved, there has been increased interest in using them to model non-linguistic tokens. For example, the Decision Transformer recasts agentic decision making as a sequence modeling problem, using a decoder-only LLM to model the distribution over the discrete action space for an Atari agent. However, when adapting LLMs to non-linguistic domains, it remains unclear if softmax over discrete bins captures the continuous structure of the tokens and the po

Source

http://arxiv.org/abs/2410.22269v2