Summary
This paper presents a framework for evaluating safety risks in generative AI systems through a sociotechnical lens. The authors argue that current capability-focused evaluations are insufficient and propose a three-layered approach that considers:
- Capability evaluation (technical components)
- Human interaction evaluation (human-AI interaction)
- Systemic impact evaluation (broader societal effects)
The paper also surveys existing safety evaluations and identifies key gaps in current approaches.
Key Points
- Current safety evaluations focus too narrowly on technical capabilities while ignoring important contextual factors
- The proposed framework provides structured, comprehensive approach considering both technical and social dimensions
- Major gaps exist in evaluations for:
- Several key risk areas
- Human interaction and systemic impacts
- Multimodal AI systems
- The authors propose practical steps to close these gaps and outline roles for different stakeholders
Contribution
The paper makes two main contributions:
- A sociotechnical framework for safety evaluation that systematically considers context and emergent effects
- A comprehensive survey of current safety evaluation approaches and identification of gaps, with proposed solutions
Limitations/Future Work
- Evaluation cannot catch all potential risks
- Some risks are difficult to operationalize and measure accurately
- Evaluations embed normative choices that need to be made explicit
- Need for standardization and independent evaluation approaches
- Challenge of evaluating impacts before deployment