📖 tl;dr Responsible AI
Search
Search
Dark mode
Light mode
Explorer
Case Studies
Case Study Aggregators
Generative AI and Labor: Power, Hype, and Value at Work
Core Concepts
Red Teaming Methods in AI Security
Responsible AI Glossary
Educational Resources
Azure AI Foundry Risk and Safety
The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence
Evaluation and Testing
Benchmarks
Agent-SafetyBench - Evaluating the Safety of LLM Agents
BBQ: A Hand-Built Bias Benchmark for Question Answering
CrowS-Pairs - A Challenge Dataset for Measuring Social Biases in Masked Language Models
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Methods
AART AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications
Building Safe GenAI Applications - An End-to-End Overview of Red Teaming for Large Language Models
On Verbalized Confidence Scores for LLMs
STAR: SocioTechnical Approach to Red Teaming Language Models
Tools and Utilities
Azure AI Foundry Agent Evaluate SDK
LLM Comparator
Microsoft RAI Impact Assessment Guide Summary
SafeArena: Evaluating the Safety of Autonomous Web Agents
WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Regulations and Frameworks
Evaluating the Social Impact of Generative AI Systems
NIST AI 600-1: Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile
Safety and Alignment
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Know Thy Judge - On the Robustness Meta-Evaluation of LLM Safety Judges
Red Teaming Language Models to Reduce Harms - Methods, Scaling Behaviors, and Lessons Learned
Red-Teaming in the Public Interest
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Sociotechnical Safety Evaluation of Generative AI Systems
The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
LICENSE
README
Home
❯
tags
❯
Tag: side-by-side-evaluation
Tag: side-by-side-evaluation
1 item with this tag.
Apr 13, 2025
LLM Comparator
llm-evaluation
visualization
tool
side-by-side-evaluation
responsible-ai