This AI Research from Ohio State University and CMU Discusses Implicit Reasoning in Transformers And Achieving Generalization Through Grokking
What limitations do large language models have in implicit reasoning?

Large language models (LLMs) like GPT-4 struggle with implicit reasoning, often making inaccurate comparisons and having difficulty inducing structured representations of rules and facts3. This affects their ability to generalize knowledge systematically. While transformers can learn implicit reasoning through a process called grokking, they face challenges in generalizing effectively for composition tasks, particularly with out-of-distribution examples.
How do transformers perform in comparison and composition tasks?

Transformers perform well in comparison tasks, showing strong generalization even with out-of-distribution examples. However, they struggle to generalize effectively for composition tasks when faced with such examples, as revealed by recent research from Ohio State University and Carnegie Mellon University.
What is grokking in the context of deep learning models?

Grokking is a phenomenon in deep learning models where the model continues to learn and improve its performance on a task even after it has seemingly overfit the training data. It involves the model learning the underlying patterns and structures of the task, leading to better generalization and robustness.