This week we spotlight the sixteenth framework of risks from AI included in the AI Risk Repository:
Weidinger, L, Uesato, J, Rauh, M, Griffin, C, Huang, P, Mellor, J, Glaese, A, Cheng, M, Balle, B, Kasirzadeh, A, Biles, C, Brown, S, Kenton, Z, Hawkins, W, Stepleton, T, Birhane, A, Hendricks, LA, Rimell, L, Isaac, W, Haas, J, Legassick, S, Irving, G & Gabriel, I 2022, Taxonomy of risks posed by language models. in FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, pp. 214-229. https://doi.org/10.1145/3531146.3533088
This framework is a comprehensive taxonomy of ethical and social risks associated with large language models.
The taxonomy includes 6 domains of AI risk and 20 subdomains:
Discrimination, Hate speech and Exclusion
Social stereotypes and unfair discrimination
Hate speech and offensive language
Exclusionary norms
Lower performance for some languages and social groups
Information Hazards
Compromising privacy by leaking sensitive information
Compromising privacy or security by correctly inferring sensitive information
Misinformation Harms
Disseminating false or misleading information
Causing material harm by disseminating false or poor information e.g. in medicine or law
Malicious Uses
Making disinformation cheaper and more effective.
Assisting code generation for cyber security threats
Facilitating fraud, scams and targeted manipulation.
Illegitimate surveillance and censorship
Human-Computer Interaction Harms
Promoting harmful stereotypes by implying gender or ethnic identity
Anthropomorphising systems can lead to overreliance or unsafe use
Avenues for exploiting user trust and accessing more private information
Human-like interaction may amplify opportunities for user nudging, deception or manipulation
Environmental and Socioeconomic harms
Environmental harms from operating LMs.
Increasing inequality and negative effects on job quality.
Undermining creative economies.
Disparate access to benefits due to hardware, software, skill constraints.
Key features of the framework and associated paper:
Focuses on risks associated with operating language models, and risks of harm that are upstream from operating language models (such as those associated with training language models) are not discussed.
Focuses on risks associated with ‘raw language models’ and not specific applications such as chatbots for psychotherapy.
Does not focus on risks that depend on multiple modalities such as models that combine language with other domains such as vision or robotics.
Risks are located via two methods: 1) interdisciplinary workshops and discussions amongst Google DeepMind researchers and 2) horizon scanning exercise with an in depth literature review.
Distinguishes between “observed” and “anticipated” risks: observed risks are already present in language models, whereas anticipated risks have not yet present but are likely to occur.
Disclaimer
This summary highlights a paper included in the MIT AI Risk Repository. We did not author the paper and credit goes to Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. For the full details, please refer to the original publication: https://doi.org/10.1145/3531146.3533088.