Tracking and classifying incidents of harm from AI

The AI Incident Tracker project classifies real-world, reported incidents by AI Risk Repository risk domain, causal factors, and harm caused.

What is the AI Incident Tracker?

AI incidents are on the rise, yet current databases struggle with inconsistent structure, limiting their utility for policymaking. The AI Incident Tracker project addresses this by creating a tool to classify AI incidents based on risks and harm severity. Using a Large Language Model (LLM), the tool processes raw reports from the AI Incident Database and categorizes them using established frameworks, such as the MIT Risk Repository and a harm severity rating system based on CSET’s AI Harm Taxonomy.

This project, led by Simon Mylius, provides a proof-of-concept analysis of reported AI incidents, including preliminary insights into trends in the available data.

You can explore the AI Incident Tracker using interactive visualizations, including data categorized using MIT Risk Taxonomies, changes in incidents over time, the severity of harm associated with incidents, and more.

This interactive visualization shows how incidents of harm from AI reported in the AI Incident Database are increasing over time, with the greatest increase in incidents associated with the Misinformation and Malicious Actors domains from the MIT AI Risk Repository. Explore the AI Incident Tracker in more detail in the below sections.

Explore the AI Incident Tracker Project

You can explore different views of the database and classification in the project. For example, you can see all AI incidents classified using taxonomies from the MIT Risk Repository, the type of harm, and individual records in the AI Incident Database.

Key visualizations include bar charts and pie charts that display incident counts, proportions across domains (e.g., "System Failures," "Discrimination & Toxicity"), and trends in causal attributes. Additionally, insights highlight patterns such as the prevalence of system safety issues, intentional misuse trends, and incomplete reporting gaps.

Click through the links below to explore each of the interactive dashboards.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Risk Classification

View AI incidents classified using Causal and Domain Taxonomies from the MIT AI Risk Repository and risk levels defined by the EU AI Act.

Incident View

View detailed analysis of each AI incident in the incident database, including how the incident is classified, how severe the harm is (across 10 categories of harm), and the classification confidence.

Risk Classification

View how AI incidents have changed over time (2015-2024), using Causal and Domain Taxonomies from the MIT AI Risk Repository

Sub-domains

Detailed view of how AI incidents have changed over time (2015-2024), across the 23 subdomains of the MIT AI Risk Repository Domain Taxonomy

High Severity Incidents

View how AI incidents have changed in severity over time

High Severity Multiple Categories

View the highest severity AI incidents that have resulted in multiple types of harm (e.g. physical harm AND property damage)

Number of People Affected

View the number of medium and high severity incidents where the number of people either harmed or exposed to risk exceeds 1000 and 1 million people.

Harm Taxonomy

View a harm taxonomy for AI incidents, which distinguishes between 10 types of harm (e.g., physical harm, property damage, financial loss, human rights) based on work by CSET

Alleged Developers

View which developers are associated with the most incidents, with different harm severities, and see how the distribution has changed over the past 10 years.

What can I use the AI Incident Tracker for?

An accessible overview of the AI risk incidents landscape
A regularly updated source of information about new risks and research
A common frame of reference for researchers, developers, businesses, evaluators, auditors, policymakers, and regulators
A resource to help develop research, curricula, audits, and policy

Important note on data quality and validity of analysis

This classification database is intended to explore the potential capabilities and limitations of a scalable incident analysis framework. The classification analysis uses reports from the AI Incident Database (AIID) as input data which rely on submissions from the public and subject matter experts. The quality, reliability and depth of detail in the reports varies across the dataset. As the reporting is voluntary, the dataset is inevitably subject to some degree of sampling bias. Spot-checks have been used to provide feedback on misclassifications and to iterate the tool, improving its reliability, however a systematic validation study has not yet been completed.

Therefore patterns and trends observed in the data should be taken as indicative and validated through further analysis.

Future Work

This work proves the concept of a scalable incident classification tool and paves the way for future work to explore its usefulness, validity, limitations and potential, which will be focused around the following activities:

User-stories - continue collecting user-stories to refine the tool in order to make it as relevant and useful as possible.
Validation study - compare a sample of incident classifications with human analysis to understand the validity and reliability of the model outputs.
Iterate methodology to improve validity - once a target sample of human classifications is available, the process can be updated and outputs evaluated in order to incorporate changes that improve validity,
Incorporate Root Cause Analysis - the analysis lends itself to working alongside root cause analysis such as Ishikawa (identification of potential contributing causes) and Fault Tree Analysis (deductive analysis of how contributing causes interact in conjunctive/disjunctive combination)
Adapt for other datasets of incident databases - the process could be applied to new datasets of reports to provide additional learning from a wider sample
Explore further insights from the analysis - what real-world policy decisions could insights from this analysis inform?
Link to safety cases and risk assessments - explore how the output of this type of analysis could be used as evidence to update risk assessments or safety cases
Lessons learned for new incident monitoring processes. For example, are there commonalities in missing pieces of information in incident reports, or where analyses have low confidence scores?

The code repository will be made available as open-source to encourage users to evaluate and contribute improvements.

Please feel free to share feedback using this form - this will shape the direction of the work and help to make the tool as useful and relevant as possible.

[TO UPDATE / REMOVE] Frequently Asked Questions

AI Incident Tracker Project Leads

Simon Mylius

Winter Fellow at the Centre for the Governance of AI

Simon is a Chartered Engineer with over a decade of experience leading Systems Engineering teams in product development and system integration. He is focused on applying Systems Engineering methodology to the technical governance of Artificial Intelligence. He was motivated to start the AI Incident Tracker project by the potential for AI incident data to be used more effectively in policymaking through adding structure and consistency.

Contact Simon | Give feedback on AI Incident Tracker

Jamie Bernardi

AI Governance Researcher

Jamie is a researcher and writer focused on helping society adapt to advanced AI through actionable policy proposals. With experience spanning research, entrepreneurship, and AI education, Jamie has contributed to UK policymaking and international AI governance efforts, including the EU and US. A former GovAI Winter Fellow and IAPS Fellow, Jamie co-founded BlueDot Impact, leading its AI Safety Fundamentals courses and community. Earlier, Jamie worked on Safe Reinforcement Learning with researchers at the University of Oxford and as a machine learning engineer.

Contact Jamie | Give feedback on AI Incident Tracker

[TO UPDATE / REMOVE] Acknowledgments

Feedback and useful input: Anka Reuel, Michael Aird, Greg Sadler, Matthjis Maas, Shahar Avin, Taniel Yusef, Elizabeth Cooper, Dane Sherburn, Noemi Dreksler, Uma Kalkar, CSER, GovAI, Nathan Sherburn, Andrew Lucas, Jacinto Estima, Kevin Klyman, Bernd W. Wirtz, Andrew Critch, Lambert Hogenhout, Zhexin Zhang, Ian Eisenberg, Stuart Russell, and Samuel Salzer.

How to engage

Use the Classification tool and give feedback

Explore incident classifications Give feedback