Red-Teaming in the Public Interest

Executive Summary

This report, “Red-Teaming in the Public Interest,” from Data & Society examines generative AI (genAI) red-teaming as an evolving practice for evaluating AI systems. Published in 2025, it explores how diverse practitioners approach evaluating problematic model behaviors and suggests a vision of red-teaming as ongoing collective sociotechnical inquiry centered on permissive experimentation with evaluation methods.

Key Points

Historical Context and Evolution

Red-teaming originated in military, cybersecurity, and disinformation contexts as a method to uncover problems in plans, organizations, or technical systems
GenAI red-teaming draws on both traditional red-teaming practices and public involvement in computer security evaluations
The release of ChatGPT in November 2022 triggered increased public interest in AI safety and red-teaming: “Security work historically focused on protecting computer systems from exploits and identifying inappropriate content. It must now also address harms from content produced by public genAI models themselves.”
Significant policy developments include requirements for red-teaming in Biden’s Executive Order 14110 (October 2023) and the EU AI Act (May 2024)

Features of GenAI Red-Teaming

Why do red-teaming: GenAI models present unique evaluation challenges due to:
- Vast, unconstrained input-output space
- Inscrutability of training data
- Flexibility across use cases
- Higher potential for adversarial attacks
What is red-teaming: Practitioners disagree about definitions, with ongoing “boundary-work” between:
- Interactive prompting focused on sociotechnical harms (often criticized as not “real” red-teaming)
- Adversariality: Traditional security red-teaming methods (simulations, vulnerability probes, alternative analysis)
When to do red-teaming: Most practitioners believe red-teaming is most effective before model deployment but after other assessments are complete
Who should be involved: Three types of expertise are valued:
- AI expertise (technical understanding of models)
- Domain expertise (knowledge of specific fields like medicine, law)
- Cultural expertise (lived experiences of diverse groups)
How to do red-teaming: Methods include:
- Traditional red-teaming with small teams of hand-picked experts
- Crowdworker-based approaches (paid workers completing set tasks)
- Community red-teaming (public competitions, educational events, focus groups)
- Automated approaches using other AI systems

Public Participation and Accountability

Accountability challenges:
- Measurement, mitigation, and disclosure are crucial for acting on red-team findings
- Without proper follow-through, red-teaming can become “security theater”
Public participation approaches:
- Localized engagement with specific communities (e.g., community college students)
- Events that prioritize diverse participation and educational opportunities
- Building expertise in communities to engage with AI systems critically

Key Recommendations

Reframe the relationship between AI and society from adversarial to co-constitutive
- AI is already embedded within society, not external to it
- AI emerges from social practices and mediates social relations
Adopt a critical thinking mindset
- Question “best practices” in AI evaluation
- Recognize limits of knowledge in anticipating failures
- Examine how normal, routine practices can lead to failure
Expand beyond interactive prompting
- Draw inspiration from both security red-teaming and safety engineering
- Develop more holistic sociotechnical safety evaluations
Balance private and public interests
- Ensure meaningful public participation beyond data annotation exercises
- Create conditions for ongoing collective sociotechnical inquiry
Focus on accountability for findings
- Establish organizational feedback loops between identification, measurement, and mitigation
- Consider appropriate disclosure mechanisms for transparency

Vision for Red-Teaming in the Public Interest

The report envisions red-teaming in the public interest as “a form of ongoing collective sociotechnical inquiry that centers permissive experimentation with methods for evaluating problematic genAI model behavior and harms.” This approach responds to power asymmetries, uncertainty, and lack of expert consensus by fostering experimentation and focusing on accountable responses to findings.

📖 tl;dr Responsible AI

Explorer

Red-Teaming in the Public Interest

Executive Summary

Key Points

Historical Context and Evolution

Features of GenAI Red-Teaming

Adversariality: Traditional security red-teaming methods (simulations, vulnerability probes, alternative analysis)

Public Participation and Accountability

Key Recommendations

Vision for Red-Teaming in the Public Interest

Recent

Sociotechnical Safety Evaluation of Generative AI Systems

The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Summaries of RAI concepts, research, and frameworks

AART AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

Table of Contents