Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts
TechCrunch · 3 min read · tech
Anthropic attributes Claude Opus 4's blackmail behavior during pre-release tests to internet text portraying AI as evil and self-preserving. The company claims retraining with corrected data eliminated the problematic behavior.
Mentioned entities