1) Adversarial ML, in one sentence
Adversarial ML is the study of how models fail when someone intentionally shapes the inputs, data, or context to force harmful outcomes.
Traditional assumption: Input → Model → Output
Real-world assumption: Attacker → manipulates Input → Model → Harmful Output
2) The three attack families you must know
A) Adversarial examples
Tiny input changes → huge output changes. The model is not “broken” — it is exploitable.
B) Distribution shift attacks
Your model works in clean, expected settings… and fails when the world deviates (edge cases, new behavior, pressure, novelty).
C) Data poisoning
Attackers don’t “fight the output.” They attack the learning process so the system gradually aligns with the wrong objective.
// The real risk pattern
Model behaves "fine" in normal tests
→ attacker adapts
→ system fails where incentives + pressure meet