
Next-token prediction in AI faces limitations such as error compounding in autoregressive inference, where minor prediction inaccuracies can lead to significant deviations in long sequences, and shortcut learning in teacher-forcing, which hampers learning true sequence dependencies crucial for planning and reasoning tasks. These limitations affect the performance and applicability of AI models, particularly in complex, long-term planning and decision-making tasks.

In autoregressive inference, errors compound as the model predicts the next token based on previous predictions. Minor inaccuracies in initial predictions can lead to deviations from the intended sequence. As the model continues making predictions, these deviations accumulate, causing significant errors in long output sequences. This compounding effect is a limitation of autoregressive inference in tasks requiring complex, long-term planning and decision-making.

Teacher-forcing is a training strategy used in the development of sequence-to-sequence models3. It accelerates the learning process by providing the correct input at each step of the sequence rather than allowing the model to generate the next step based on its previous outputs. However, it has a disadvantage in that it can cause the model to see different inputs during training compared to test time, potentially leading to a gap between training and test performance.