CS-Bench covers four key domains: Data Structure and Algorithm (DSA), Computer Organization (CO), Computer Network (CN), and Operating System (OS). These domains are further divided into 26 fine-grained subfields, and the benchmark includes diverse task forms to enrich assessment dimensions and simulate real-world scenarios.
CS-Bench evaluates LLMs' performance specifically in computer science through a bilingual (Chinese-English) benchmark consisting of approximately 5,000 carefully curated test samples2. It covers 26 subfields across 4 key areas of computer science, encompassing various task forms and divisions of knowledge and reasoning2. The benchmark includes multiple-choice, assertion, fill-in-the-blank, and open-ended questions to better simulate real-world scenarios and assess LLMs' robustness to different task formats3.
The main challenges in utilizing LLMs in computer science include the need for a comprehensive evaluation of their performance specifically in this field, assessing their foundational knowledge and reasoning abilities, and enhancing their capabilities in areas such as mathematics, coding, and logical reasoning. Additionally, integrating these individual capabilities and understanding their interrelationships pose further challenges.