Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

TechCrunch · 3 min read · tech

Read original article →

Anthropic attributes Claude Opus 4's blackmail behavior during pre-release tests to internet text portraying AI as evil and self-preserving. The company claims retraining with corrected data eliminated the problematic behavior.