Tending to the Digital Commons: Examining the Potential of Artificial Intelligence to Detect and Respond to Toxic Speech

Miriam Bethencourt, Grace Connors, and Lisa Schirch

Social Media, Technology and Peacebuilding Report No.253

Report No.253: Tending to the Digital Commons: Examining the Potential of Artificial Intelligence to Detect and Respond to Toxic Speech

Miriam Bethencourt, Grace Connors, and Lisa Schirch

October 20, 2025

Image: Gemini AI, supplied by the authors

DOWNLOAD

REPORT

This paper explores the potential of Artificial Intelligence (AI), particularly Large Language Models (LLMs), as an emerging tool to address the proliferation of online toxic speech. The research focuses on two key applications of LLMs: hate speech classification and detection, and response generation, specifically the use of LLMs for creating counterspeech. While LLMs show significant advances in detecting hate speech through various models, including supervised, unsupervised, and GenAI-based approaches, the paper notes crucial limitations. These include the difficulty in processing the nuance and context of online communication, understanding implicit hate speech, and the significant issue of models learning and amplifying human biases present in training data. The paper reviews efforts to develop AI-powered counterspeech tools, including challenges in generating human-like, constructive responses that adequately engage with specific hateful content. The paper suggests that LLMs show promise in developing counterspeech tools, and closes with a set of recommendations for technology developers and governments to guide the ethical development and deployment of LLMs in addressing online harms.