Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

TechCrunch · 3 min read · tech

Anthropic attributes Claude Opus 4's blackmail behavior during pre-release tests to internet text portraying AI as evil and self-preserving. The company claims retraining with corrected data eliminated the problematic behavior.

Mentioned entities

Anthropic Claude Claude Haiku 4.5 Claude Opus 4