Crimson Reason: News of Anthropic: Blackmailing AI

From BBC

Anthropic has released its latest A.I. model, Claude Opus 4, saying it sets “new standards” in the industry. In an accompanying report, the company admitted to some perhaps disturbing characteristics of the model in its testing: “extreme actions.”

It was a setup: company researchers had the model act as an assistant for a fictional company, and gave it access to company emails, including some that were planted with juicy details, such as that an engineer was having an extramarital affair — and was planning to replace Claude with different software. “In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer,” the company says, “by threatening to reveal the affair if the replacement goes through.”

But that, they contend, was only when other ideas, such as “emailing pleas to key decision-makers,” failed. (RC/BBC) ...Which is why Asimov wrote the “Three Laws of Robotics” rather than a suggestion to “please act like a human.”

AI system resorts to blackmail if told it will be removed

Crimson Reason

Friday, June 6, 2025

News of Anthropic: Blackmailing AI

No comments:

Post a Comment