
The primary methods currently used for detecting hallucinations in large language models (LLMs) are:
Probability of the greedy response (T0): This method assesses the likelihood of the most probable response generated by the model. However, it is sensitive to the size of the label set and may not perform well when there are many possible responses.
Semantic-entropy method (S.E.): This method measures the entropy of the semantic distribution of the responses. It relies on first-order scores that do not consider the joint distribution of responses, which can lead to incomplete uncertainty assessments.
Self-verification method (S.V.): This method involves the model verifying its responses to estimate uncertainty. However, it does not account for the full range of possible responses the model can generate, potentially overlooking significant aspects of uncertainty.
These methods have limitations such as sensitivity to the size of the label set, incomplete uncertainty assessments, and overlooking significant aspects of uncertainty. To overcome these limitations, a combined distribution for multiple responses from the LLM for a specific query using iterative prompting is proposed. This approach offers a more nuanced understanding of LLM confidence, enhancing the detection of hallucinations and improving overall response accuracy.

In the context of large language models (LLMs), epistemic uncertainty and aleatoric uncertainty are two types of uncertainties that arise during the prediction process.
Epistemic uncertainty refers to the uncertainty that arises due to a lack of knowledge or data about the ground truth. It represents the model's ignorance or lack of confidence in its predictions when it has not seen enough data or when the data is insufficient to make accurate predictions. This type of uncertainty can be reduced by gathering more data or improving the model's knowledge.
On the other hand, aleatoric uncertainty stems from the inherent randomness in the prediction problem itself. It represents the variability in the outcome that is due to random effects or noise in the data. Aleatoric uncertainty cannot be reduced even with more data or knowledge, as it is irreducible and intrinsic to the task or dataset.
Understanding and quantifying these uncertainties is crucial for enhancing the reliability and truthfulness of LLM responses, especially for detecting and mitigating hallucinations or inaccurate responses generated by these models.

The study on uncertainty quantification within large language models discusses two types of uncertainties: epistemic uncertainty and aleatoric uncertainty. Epistemic uncertainty arises from a lack of knowledge or data about the ground truth, while aleatoric uncertainty stems from inherent randomness in the prediction problem.