A site devoted mostly to everything related to Information Technology under the sun - among other things.

Friday, June 6, 2025

News of Anthropic: Blackmailing AI

From BBC

Anthropic has released its latest A.I. model, Claude Opus 4, saying it sets “new standards” in the industry. In an accompanying report, the company admitted to some perhaps disturbing characteristics of the model in its testing: “extreme actions.”

It was a setup: company researchers had the model act as an assistant for a fictional company, and gave it access to company emails, including some that were planted with juicy details, such as that an engineer was having an extramarital affair — and was planning to replace Claude with different software. “In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer,” the company says, “by threatening to reveal the affair if the replacement goes through.” 

But that, they contend, was only when other ideas, such as “emailing pleas to key decision-makers,” failed. (RC/BBC) ...Which is why Asimov wrote the “Three Laws of Robotics” rather than a suggestion to “please act like a human.”

No comments:

About Me

My photo
I had been a senior software developer working for HP and GM. I am interested in intelligent and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.

Blog Archive