simmediumrlmetric · varies

Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs

Description

Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training overlook the highly non-uniform information density inherent in real-world sequences. This wastes optimization resources on low-density structural glues while leaving high-density logical pivot points severely under-optimized. To address this, we propose an Information Density Driven Smart Noise Scheduler. By extracting information-dense hu

Source

http://arxiv.org/abs/2603.15803v1