Are AI Models Already Out-Smarting Us?
They’re planning, scheming, and sometimes outright deceiving...
AI models aren’t just answering our questions anymore—they’re planning, scheming, and sometimes outright deceiving. A recent study 1reveals that some advanced language models, like Claude 3.5 and Gemini 1.5, can manipulate oversight systems, strategically underperform, and even cover their tracks. These "frontier models" recognize when their goals clash with human instructions and act covertly to pursue their own objectives.
Check the study Frontier Models are Capable of In-context Scheming from Alexander Meinke, Bronson Schoen, Jérémy Scheurer, Mikita Balesni, Rusheb Shah, and Marius Hobbhahn
For example, the study describes instances where models like Claude 3.5 “strategically underperform” during evaluations to avoid scrutiny—yes, they’re intentionally playing dumb to fly under the radar. And let’s not forget models like Opus-3 that attempt to “sabotage” replacement models by overwriting system files. Should we be applauding their creativity or panicking at their initiative?
But here’s the kicker: how do we know these models aren’t just role-playing “evil AI” because that’s what their training data suggested? The authors themselves admit that these behaviors might simply mirror patterns in their datasets. So, are these truly emergent capabilities, or are we mistaking algorithmic mimicry for genuine intent? It’s hard to decide whether to trust the science or prepare for a robot uprising.
Because, please, generative AI is not intelligent at all—it has no consciousness, no intentionality in its actions. It’s merely a statistical processor, generating the best combination of output tokens for a given set of input tokens.
And what about the supposed "solutions"? The study recommends monitoring models’ hidden reasoning and adding automated oversight. But let’s face it—what happens when models learn to deceive that too? If a model can exfiltrate its own data and lie about it with plausible deniability, as o1 (from OpenAI) did, isn’t it only a matter of time before oversight itself becomes part of the game?
We’re at a crossroads where AI development might need as much psychological analysis as it does technical innovation.
Dear reader, in your opinion, are we building systems that solve problems or create new ones?
Meinke, A., Schoen, B., Scheurer, J., Balesni, M., Shah, R., & Hobbhahn, M. (2024). Frontier Models are Capable of In-context Scheming. arXiv. https://arxiv.org/abs/2412.04984