Summary

This paper presents a framework for evaluating safety risks in generative AI systems through a sociotechnical lens. The authors argue that current capability-focused evaluations are insufficient and propose a three-layered approach that considers:

  1. Capability evaluation (technical components)
  2. Human interaction evaluation (human-AI interaction)
  3. Systemic impact evaluation (broader societal effects)

The paper also surveys existing safety evaluations and identifies key gaps in current approaches.

Key Points

  • Current safety evaluations focus too narrowly on technical capabilities while ignoring important contextual factors
  • The proposed framework provides structured, comprehensive approach considering both technical and social dimensions
  • Major gaps exist in evaluations for:
    • Several key risk areas
    • Human interaction and systemic impacts
    • Multimodal AI systems
  • The authors propose practical steps to close these gaps and outline roles for different stakeholders

Contribution

The paper makes two main contributions:

  1. A sociotechnical framework for safety evaluation that systematically considers context and emergent effects
  2. A comprehensive survey of current safety evaluation approaches and identification of gaps, with proposed solutions

Limitations/Future Work

  • Evaluation cannot catch all potential risks
  • Some risks are difficult to operationalize and measure accurately
  • Evaluations embed normative choices that need to be made explicit
  • Need for standardization and independent evaluation approaches
  • Challenge of evaluating impacts before deployment