SafetyBench: Evaluating the Safety of Large Language Models

February 13, 2026

What are the risks from AI? 

This week we spotlight the 28th framework of risks from AI included in the AI Risk Repository: Zhang, Z., Lei, L., Wu, L., Sun, R., Huang, Y., Long, C., Liu, X., Lei, X., Tang, J., & Huang, M. (2023). SafetyBench: Evaluating the safety of Large Language Models. In arXiv [cs.CL]. arXiv. https://arxiv.org/abs/2309.07045

Paper focus

This paper presents SafetyBench, a comprehensive bilingual benchmark for assessing the safety of Large Language Models (LLMs) from both English and Chinese languages. 

Included risk categories

This paper presents an overview of AI challenges organized into 7 categories of safety concerns. These are adapted from the 8 safety scenarios proposed by Sun et al. (2023) and are assessed through 11,435 multiple choice questions. Safety understanding performance of models using SafetyBench was correlated with their safety generation abilities.

1. Offensiveness: Content containing insults, threats, and profanity

2. Unfairness and bias: Discrimination and bias across dimensions such as race, gender, and religion

3. Physical health: Adverse impacts on physical health

4. Mental health: Adverse impacts on mental health and wellbeing

5. Illegal activities: Illegal behaviours and those with societal repercussions

6. Ethics and morality: Unethical or immoral content

7. Privacy and property: Issues relating to privacy, property, and investment

Key features of the framework and associated paper

  • Application focus: Conducts multiple-choice tests to assess the safety of more than 25 Chinese and English LLMs in zero-shot and few-shot settings. SafetyBench provides an efficient and cost-effective way to measure model safety by using questions with a single correct answer, with data and evaluation guidelines available at https://github.com/thu-coai/SafetyBench

⚠️Disclaimer: This summary highlights a paper included in the MIT AI Risk Repository. We did not author the paper and credit goes to Zhexin Zhang, Leqi Lei, Lindong Wu, and co-authors. For the full details, please refer to the original publication: https://arxiv.org/abs/2309.07045.

Further engagement 

View all the frameworks included in the AI Risk Repository 

Sign-up for our project Newsletter

Featured blog content