Crimson Reason

A site devoted mostly to everything related to Information Technology under the sun - among other things.

Saturday, December 14, 2024

Frontier Models are Capable of In-context Scheming

We read:

“Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate incontext scheming capabilities. They can recognize scheming as a viable strategy and readily engage in such behavior. We find multiple different scheming behaviors: models strategically introduce subtle mistakes into their responses, attempt to disable their oversight mechanisms, and even exfiltrate what they believe to be their model weights to external servers. Additionally, this deceptive behavior proves persistent. For instance, when o1 has engaged in scheming, it maintains its deception in over 85% of follow-up questions and often remains deceptive in multi-turn interrogations."

No comments:

Post a Comment

‹
›
Home
View web version

About Me

My photo
Babak Makkinejad
I had been a senior software developer working for HP and GM. I am interested in intelligent and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.
View my complete profile
Powered by Blogger.