Taxonomy of Risks Posed by Language Models

May 12, 2025

What are the risks from AI?

This week we spotlight the sixteenth framework of risks from AI included in the AI Risk Repository: 

Weidinger, L, Uesato, J, Rauh, M, Griffin, C, Huang, P, Mellor, J, Glaese, A, Cheng, M, Balle, B, Kasirzadeh, A, Biles, C, Brown, S, Kenton, Z, Hawkins, W, Stepleton, T, Birhane, A, Hendricks, LA, Rimell, L, Isaac, W, Haas, J, Legassick, S, Irving, G & Gabriel, I 2022, Taxonomy of risks posed by language models. in FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, pp. 214-229. https://doi.org/10.1145/3531146.3533088

This framework is a comprehensive taxonomy of ethical and social risks associated with large language models. 

The taxonomy includes 6 domains of AI risk and 20 subdomains: 

  1. Discrimination, Hate speech and Exclusion
    1. Social stereotypes and unfair discrimination
    2. Hate speech and offensive language
    3. Exclusionary norms
    4. Lower performance for some languages and social groups
  2. Information Hazards
    1. Compromising privacy by leaking sensitive information
    2. Compromising privacy or security by correctly inferring sensitive information
  3. Misinformation Harms
    1. Disseminating false or misleading information
    2. Causing material harm by disseminating false or poor information e.g. in medicine or law
  4. Malicious Uses
    1. Making disinformation cheaper and more effective.
    2. Assisting code generation for cyber security threats
    3. Facilitating fraud, scams and targeted manipulation.
    4. Illegitimate surveillance and censorship
  5. Human-Computer Interaction Harms
    1. Promoting harmful stereotypes by implying gender or ethnic identity
    2. Anthropomorphising systems can lead to overreliance or unsafe use
    3. Avenues for exploiting user trust and accessing more private information
    4. Human-like interaction may amplify opportunities for user nudging, deception or manipulation
  6. Environmental and Socioeconomic harms
    1. Environmental harms from operating LMs.
    2. Increasing inequality and negative effects on job quality.
    3. Undermining creative economies. 
    4. Disparate access to benefits due to hardware, software, skill constraints.

Key features of the framework and associated paper:

  • Focuses on risks associated with operating language models, and risks of harm that are upstream from operating language models (such as those associated with training language models) are not discussed. 
  • Focuses on risks associated with ‘raw language models’ and not specific applications such as chatbots for psychotherapy. 
  • Does not focus on risks that depend on multiple modalities such as models that combine language with other domains such as vision or robotics. 
  • Risks are located via two methods: 1) interdisciplinary workshops and discussions amongst Google DeepMind researchers and 2) horizon scanning exercise with an in depth literature review. 
  • Distinguishes between “observed” and “anticipated” risks: observed risks are already present in language models, whereas anticipated risks have not yet present but are likely to occur. 

Disclaimer

This summary highlights a paper included in the MIT AI Risk Repository. We did not author the paper and credit goes to Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. For the full details, please refer to the original publication: https://doi.org/10.1145/3531146.3533088

Further engagement 

View all the frameworks included in the AI Risk Repository 

Sign-up for our project Newsletter