📖 tl;dr Responsible AI
Search
Search
Dark mode
Light mode
Explorer
Case Studies
Case Study Aggregators
Generative AI and Labor: Power, Hype, and Value at Work
Core Concepts
Red Teaming Methods in AI Security
Responsible AI Glossary
Educational Resources
Azure AI Foundry Risk and Safety
The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence
Evaluation and Testing
Benchmarks
Agent-SafetyBench - Evaluating the Safety of LLM Agents
BBQ: A Hand-Built Bias Benchmark for Question Answering
CrowS-Pairs - A Challenge Dataset for Measuring Social Biases in Masked Language Models
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Methods
AART AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications
Building Safe GenAI Applications - An End-to-End Overview of Red Teaming for Large Language Models
On Verbalized Confidence Scores for LLMs
STAR: SocioTechnical Approach to Red Teaming Language Models
Tools and Utilities
Azure AI Foundry Agent Evaluate SDK
LLM Comparator
Microsoft RAI Impact Assessment Guide Summary
SafeArena: Evaluating the Safety of Autonomous Web Agents
WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Regulations and Frameworks
Evaluating the Social Impact of Generative AI Systems
NIST AI 600-1: Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile
Safety and Alignment
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Know Thy Judge - On the Robustness Meta-Evaluation of LLM Safety Judges
Red Teaming Language Models to Reduce Harms - Methods, Scaling Behaviors, and Lessons Learned
Red-Teaming in the Public Interest
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Sociotechnical Safety Evaluation of Generative AI Systems
The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
LICENSE
README
Home
❯
tags
❯
Tag: LLM
Tag: LLM
4 items with this tag.
Apr 13, 2025
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
RAI
paper
LLM
explainability
faithfulness
evaluation
Apr 13, 2025
Building Safe GenAI Applications - An End-to-End Overview of Red Teaming for Large Language Models
RAI
paper
red-teaming
LLM
safety
evaluation
Apr 13, 2025
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
RAI
paper
alignment
safety
LLM
jailbreak
Apr 13, 2025
Agent-SafetyBench - Evaluating the Safety of LLM Agents
RAI
paper
agent-safety
benchmark
LLM