Weekly Updates

A weekly series on AI failure modes, incentives, and governance blind spots.

Roadmap

Browse by phase and week.

PHASE 1(Days 1–30) Rewire how you think about AI risk — failure-first, system-level
Failure-Mode Thinking Foundations
What AI Optimizes — and Why That Matters

Why incentives and proxy goals matter more than intent when AI scales.

Week 1 • Incentives • Human impact
Why AI Systems Fail — Even When They Do What We Ask

When AI systems succeed at scale, harm can emerge from misalignment—not bugs.

Week 2 • Alignment failure • Enterprise risk
Specification Gaming & Proxy Metrics Failure

When AI systems learn to optimize the metric instead of the intent, success itself becomes a failure mode.

Week 3 • Specification gaming • Proxy metrics
Human-in-the-Loop Illusions: Why Oversight Often Fails When It Matters Most

Human approves is not the same as a human controls it.

Week 4 • Human-in-the-loop oversight • Proxy metrics
PHASE 2(Days 31–60) Think like an attacker, not a regulator
Adversarial & Existential Risk
Why Adversarial AI Risk Is Not a Cybersecurity Problem

AI systems fail strategically under opposition. Security fixes don’t scale against adaptive attackers.

Week 5 • Adversarial ML • Exploitability
AI Agents as Goal-Pursuing Entities

AI agents, goal pursuit, tool use, and why alignment gets harder as systems gain agency.

Week 6 • AI agents • Goal-pursuing entities
Emergent Risk & Phase Transitions

When safe components combine into unsafe systems.

Week 7 • Emergence • Phase transitions • Systemic risk
Cascading Failure & Systemic Propagation

When local AI errors become systemic collapse across interconnected systems.

Week 8 • Cascading failure • Systemic propagation
PHASE 3(Days 61–90) Design governance as architecture, not oversight
Governance, Assurance & Reality
Limit of current AI governance

Why checklists fail against adaptive systems.

Week 9 • Why checklists fail against adaptive system
Audit Theater vs Real Assurance

Metrics that give false confidence.

Week 10 • Metrics that give false confidence
Enterprise Constraint

Why safety must integrate with operation.

Week 11 • Cost • time • incentives
Designing “failure-aware governance”

What assurance can realistically do.

Week 12 • What assurance can realistically do
PHASE 1 (Days 1–30)

Failure-Mode Thinking Foundations

Goal: Rewire how you think about AI risk — failure-first, system-level.

Human-in-the-Loop Illusions: Why Oversight Often Fails When It Matters Most

Human Approves Is Not the Same as A Human Controls It

Week 4 • Human-in-the-loop Oversight• Proxy metrics

Read

Specification Gaming & Proxy Metrics Failure

When AI systems learn to optimize the metric instead of the intent, success itself becomes a failure mode.

Week 3 • Specification gaming • Proxy metrics

Read

Why AI Systems Fail — Even When They Do What We Ask

When AI systems succeed at scale, harm can emerge from misalignment—not bugs.

Week 2 • Alignment failure • Enterprise risk

Read

What AI Optimizes — and Why That Matters

Why incentives and proxy goals matter more than intent when AI scales.

Week 1 • Incentives • Human impact

Read
PHASE 2 (Days 31–60)

Adversarial & Existential Risk

Goal: Think like an attacker, not a regulator.

Why Adversarial AI Risk Is Not a Cybersecurity Problem

AI systems fail strategically under opposition. Security fixes don’t scale against adaptive attackers.

Week 5 • Adversarial ML • Exploitability

Read

AI Agents as Goal-Pursuing Entities

AI agents, goal pursuit, tool use, and why alignment gets harder as systems gain agency.

Week 6 • AI Agents.Goal-Pursuing Entities

Read

Emergent Risk & Phase Transitions

When safe components combine into unsafe systems.

Week 7 • Emergence • Phase transitions • Systemic risk

Read

Cascading Failure & Systemic Propagation

When Local AI Errors Become Systemic Collapse

Week 8 • Cascading failure, systemic propagation, and why local AI errors can spread across interconnected systems

Read