CaLM (Contrasting Large and Small Language Models) improves LLM citation accuracy by leveraging the strengths of both large and small language models3. It employs a post-verification approach where a smaller LM validates the outputs of a larger LM, cross-referencing the cited documents to confirm the accuracy of the larger LM's citations. If responses align, the large LM's answer is verified; otherwise, an iterative feedback loop refines the response, enhancing citation accuracy and answer quality.
CaLM, a novel verification framework, demonstrates significant performance improvements in ensuring accurate and verifiable responses from large language models (LLMs). By leveraging the strengths of both large and small language models and employing a post-verification approach with iterative refinement, CaLM improves answer accuracy and citation quality, outperforming state-of-the-art methods by 1.5% to 7% on average in experiments conducted on three open-domain question-answering datasets.
Model size plays a crucial role in the performance of Large Language Models (LLMs). Generally, larger models with more parameters tend to perform better on complex tasks due to their increased capacity to capture intricate patterns and relationships in the data. However, the relationship between model size and performance is not always linear, and other factors such as training data quality and quantity, architecture, and optimization techniques also significantly impact LLM performance. Additionally, larger models require more computational resources, which can impact inference speed and scalability.