Understanding the Limitations of Large Language Models (LLMs): New Benchmarks and Metrics for Classification Tasks
Who conducted the research on LLMs' classification limitations?

The research on LLMs' classification limitations was conducted by a team of researchers, including Tanya Malhotra, a final year undergraduate student from the University of Petroleum & Energy Studies, Dehradun.
What is the purpose of the OMNIACCURACY metric?

The OMNIACCURACY metric is designed to assess the performance of Large Language Models (LLMs) in classification tasks1. It combines outcomes when proper labels are present and absent, providing a more in-depth evaluation of the models' capabilities1. This metric aims to better approximate human-level discrimination intelligence in classification tasks by demonstrating the LLMs' capacity to manage situations with and without correct labels1.
What are the primary concerns with LLMs' label processing?

The primary concerns with LLMs' label processing are versatility and discriminative vs. generative capabilities. LLMs can work with any set of labels, but they may not recognize inaccurate labels or point out when they are absent. Additionally, as generative models, LLMs often lack discriminative capabilities, which could overestimate their usefulness in classification tasks.