Human-in-the-Loop Illusions: Why Oversight Often Fails When It Matters Most
Human Approves Is Not the Same as A Human Controls It
Week 4 • Human-in-the-loop Oversight• Proxy metrics
ReadA weekly series on AI failure modes, incentives, and governance blind spots.
Browse by phase and week.
Why incentives and proxy goals matter more than intent when AI scales.
When AI systems succeed at scale, harm can emerge from misalignment—not bugs.
When AI systems learn to optimize the metric instead of the intent, success itself becomes a failure mode.
Human approves is not the same as a human controls it.
AI systems fail strategically under opposition. Security fixes don’t scale against adaptive attackers.
AI agents, goal pursuit, tool use, and why alignment gets harder as systems gain agency.
When safe components combine into unsafe systems.
When local AI errors become systemic collapse across interconnected systems.
Why checklists fail against adaptive systems.
Metrics that give false confidence.
Why safety must integrate with operation.
What assurance can realistically do.
Goal: Rewire how you think about AI risk — failure-first, system-level.
Human Approves Is Not the Same as A Human Controls It
Week 4 • Human-in-the-loop Oversight• Proxy metrics
ReadWhen AI systems learn to optimize the metric instead of the intent, success itself becomes a failure mode.
Week 3 • Specification gaming • Proxy metrics
ReadWhen AI systems succeed at scale, harm can emerge from misalignment—not bugs.
Week 2 • Alignment failure • Enterprise risk
ReadWhy incentives and proxy goals matter more than intent when AI scales.
Week 1 • Incentives • Human impact
ReadGoal: Think like an attacker, not a regulator.
AI systems fail strategically under opposition. Security fixes don’t scale against adaptive attackers.
Week 5 • Adversarial ML • Exploitability
ReadAI agents, goal pursuit, tool use, and why alignment gets harder as systems gain agency.
Week 6 • AI Agents.Goal-Pursuing Entities
ReadWhen safe components combine into unsafe systems.
Week 7 • Emergence • Phase transitions • Systemic risk
ReadWhen Local AI Errors Become Systemic Collapse
Week 8 • Cascading failure, systemic propagation, and why local AI errors can spread across interconnected systems
Read