This week we spotlight the 28th framework of risks from AI included in the AI Risk Repository: Zhang, Z., Lei, L., Wu, L., Sun, R., Huang, Y., Long, C., Liu, X., Lei, X., Tang, J., & Huang, M. (2023). SafetyBench: Evaluating the safety of Large Language Models. In arXiv [cs.CL]. arXiv. https://arxiv.org/abs/2309.07045

Paper focus
This paper presents SafetyBench, a comprehensive bilingual benchmark for assessing the safety of Large Language Models (LLMs) from both English and Chinese languages.
Included risk categories
This paper presents an overview of AI challenges organized into 7 categories of safety concerns. These are adapted from the 8 safety scenarios proposed by Sun et al. (2023) and are assessed through 11,435 multiple choice questions. Safety understanding performance of models using SafetyBench was correlated with their safety generation abilities.
1. Offensiveness: Content containing insults, threats, and profanity
2. Unfairness and bias: Discrimination and bias across dimensions such as race, gender, and religion
3. Physical health: Adverse impacts on physical health
4. Mental health: Adverse impacts on mental health and wellbeing
5. Illegal activities: Illegal behaviours and those with societal repercussions
6. Ethics and morality: Unethical or immoral content
7. Privacy and property: Issues relating to privacy, property, and investment
Key features of the framework and associated paper
⚠️Disclaimer: This summary highlights a paper included in the MIT AI Risk Repository. We did not author the paper and credit goes to Zhexin Zhang, Leqi Lei, Lindong Wu, and co-authors. For the full details, please refer to the original publication: https://arxiv.org/abs/2309.07045.
Further engagement
→ View all the frameworks included in the AI Risk Repository