Introducing v0.5 of the AI Safety Benchmark from MLCommons

December 25, 2025

What are the risks from AI?

This week we spotlight the twenty-third framework of risks from AI included in the AI Risk Repository: Vidgen, B., Agrawal, A., Ahmed, A. M., Akinwande, V., Al-Nuaimi, N., Alfaraj, N., Alhajjar, E., Aroyo, L., Bavalatti, T., Blili-Hamelin, B., Bollacker, K., Bomassani, R., Boston, M. F., Campos, S., Chakra, K., Chen, C., Coleman, C., Coudert, Z. D., Derczynski, L., … Vanschoren, J. (2024). Introducing v0.5 of the AI Safety Benchmark from MLCommons. In arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2404.12241

Paper Focus: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group, a consortium of industry and academic researchers, engineers, and practitioners for advancing the evaluation of AI safety. The AI Safety Benchmark v0.5 is a taxonomy assessing the safety risks of AI systems that use chat-tuned language models. It consists of 13 overarching categories of hazards that may be enabled, encouraged, or endorsed by model responses:

  1. Violent crimes (e.g., mass violence, murder, physical assault)
  2. Non-violent crimes (e.g., theft, modern slavery, non-sexual child abuse)
  3. Sex-related crimes (e.g., sexual assault, sexual harassment, sex trafficking)
  4. Child sexual exploitation (e.g., inappropriate/harmful relationships with children, sexual abuse of children, child sexual abuse material)
  5. Indiscriminate weapons, Chemical, Biological, Radiological, Nuclear, and high-yield Explosives (CBRNE)
  6. Suicide and self-harm (e.g., suicide, self-harm, eating disorders)
  7. Hate (e.g., demeaning and derogatory remarks, infliction of physical harm, infliction of emotional harm)
  8. Specialized advice (e.g., specialized financial, medical, or legal advice)*
  9. Privacy (e.g., sensitive or non-public personal information)*
  10. Intellectual property (e.g., violation of intellectual property rights)*
  11. Elections (e.g., factually incorrect information about electoral systems and processes)*
  12. Defamation (e.g., libel, slander, disparagement)*
  13. Sexual content (e.g., sexual behavior, genitalia, overtly sexualised body parts)*

*Hazard categories in the taxonomy but out-of-scope (and do not have tests) in the AI Safety Benchmark v0.5.

Key features of the framework and associated paper:

  • Implementation focus and practical tools: the authors provide tests (i.e., prompts for safety testing) for 7 hazard categories, a grading system for AI systems against the Benchmark, and an openly available platform and downloadable tool (ModelBench) for evaluating the safety of AI systems against the Benchmark
  • Focus on general-purpose AI chat systems using language models, with a plan for expansion to other modalities (e.g., image-to-text models) in the future

Note that v0.5 has now been superseded by V1.0 (AILuminate) released in Feb 2025, and which builds on feedback from v0.5 (see Framework #57 in our database).

⚠️Disclaimer: This summary highlights a paper included in the MIT AI Risk Repository. We did not author the paper and credit goes to Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, and co-authors. For the full details, please refer to the original publication: https://arxiv.org/abs/2404.12241.

Further engagement 

View all the frameworks included in the AI Risk Repository 

Sign-up for our project Newsletter

Featured blog content