Executive Summary
This report, βRed-Teaming in the Public Interest,β from Data & Society examines generative AI (genAI) red-teaming as an evolving practice for evaluating AI systems. Published in 2025, it explores how diverse practitioners approach evaluating problematic model behaviors and suggests a vision of red-teaming as ongoing collective sociotechnical inquiry centered on permissive experimentation with evaluation methods.
Key Points
Historical Context and Evolution
- Red-teaming originated in military, cybersecurity, and disinformation contexts as a method to uncover problems in plans, organizations, or technical systems
- GenAI red-teaming draws on both traditional red-teaming practices and public involvement in computer security evaluations
- The release of ChatGPT in November 2022 triggered increased public interest in AI safety and red-teaming: βSecurity work historically focused on protecting computer systems from exploits and identifying inappropriate content. It must now also address harms from content produced by public genAI models themselves.β
- Significant policy developments include requirements for red-teaming in Bidenβs Executive Order 14110 (October 2023) and the EU AI Act (May 2024)
Features of GenAI Red-Teaming
-
Why do red-teaming: GenAI models present unique evaluation challenges due to:
- Vast, unconstrained input-output space
- Inscrutability of training data
- Flexibility across use cases
- Higher potential for adversarial attacks
-
What is red-teaming: Practitioners disagree about definitions, with ongoing βboundary-workβ between:
- Interactive prompting focused on sociotechnical harms (often criticized as not βrealβ red-teaming)
-
Adversariality: Traditional security red-teaming methods (simulations, vulnerability probes, alternative analysis)
-
When to do red-teaming: Most practitioners believe red-teaming is most effective before model deployment but after other assessments are complete
-
Who should be involved: Three types of expertise are valued:
- AI expertise (technical understanding of models)
- Domain expertise (knowledge of specific fields like medicine, law)
- Cultural expertise (lived experiences of diverse groups)
-
How to do red-teaming: Methods include:
- Traditional red-teaming with small teams of hand-picked experts
- Crowdworker-based approaches (paid workers completing set tasks)
- Community red-teaming (public competitions, educational events, focus groups)
- Automated approaches using other AI systems
Public Participation and Accountability
- Accountability challenges:
- Measurement, mitigation, and disclosure are crucial for acting on red-team findings
- Without proper follow-through, red-teaming can become βsecurity theaterβ
- Public participation approaches:
- Localized engagement with specific communities (e.g., community college students)
- Events that prioritize diverse participation and educational opportunities
- Building expertise in communities to engage with AI systems critically
Key Recommendations
-
Reframe the relationship between AI and society from adversarial to co-constitutive
- AI is already embedded within society, not external to it
- AI emerges from social practices and mediates social relations
-
Adopt a critical thinking mindset
- Question βbest practicesβ in AI evaluation
- Recognize limits of knowledge in anticipating failures
- Examine how normal, routine practices can lead to failure
-
Expand beyond interactive prompting
- Draw inspiration from both security red-teaming and safety engineering
- Develop more holistic sociotechnical safety evaluations
-
Balance private and public interests
- Ensure meaningful public participation beyond data annotation exercises
- Create conditions for ongoing collective sociotechnical inquiry
-
Focus on accountability for findings
- Establish organizational feedback loops between identification, measurement, and mitigation
- Consider appropriate disclosure mechanisms for transparency
Vision for Red-Teaming in the Public Interest
The report envisions red-teaming in the public interest as βa form of ongoing collective sociotechnical inquiry that centers permissive experimentation with methods for evaluating problematic genAI model behavior and harms.β This approach responds to power asymmetries, uncertainty, and lack of expert consensus by fostering experimentation and focusing on accountable responses to findings.