Autoregressive LLMs face several challenges, including slow processing speeds due to the sequential nature of token generation, exposure bias affecting text quality and coherence, and limitations in generating long sequences2. These issues hinder their efficiency in high-throughput scenarios and can impact the models' performance in certain tasks.
SEDD's performance was evaluated using various test datasets, including LAMBADA, Wikitext2, PTB, WikiText103, and 1BW. Comparative evaluations showed that SEDD matched or exceeded GPT-2's likelihood on these datasets.
The Score Entropy Discrete Diffusion (SEDD) approach is a novel loss that extends score matching to discrete spaces, seamlessly integrating to build discrete diffusion models and significantly boosting performance in language modeling tasks6. SEDD beats existing language diffusion paradigms and is competitive with autoregressive models, outperforming GPT-2. It generates faithful text without requiring distribution annealing techniques, allows trade-offs between compute and quality, and enables controllable infilling.