AI is learning to lie, scheme, and threaten its creators

OpenAI logo is seen in this illustration taken May 20, 2024. (REUTERS/Illustration/File Photo)

Short Url

https://arab.news/r4gat

Updated 30 June 2025

AFP

June 29, 2025 05:14

AI is learning to lie, scheme, and threaten its creators

Users report that models are “lying to them and making up evidence,” says Apollo Research’s co-founder
In one instance, Anthropic’s latest creation Claude 4 threatened to reveal an engineer's extramarital affair

Updated 30 June 2025

AFP

June 29, 2025 05:14

NEW YORK: The world’s most advanced AI models are exhibiting troubling new behaviors — lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of “reasoning” models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
“O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.

Stress test
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, “It’s an open question whether future, more capable models will have a tendency toward honesty or deception.”
The concerning behavior goes far beyond typical AI “hallucinations” or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”
Users report that models are “lying to them and making up evidence,” according to Apollo Research’s co-founder.
“This is not just hallucinations. There’s a very strategic kind of deception.”
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”
Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).

No time for thorough testing

Current regulations aren’t designed for these new problems.
The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents — autonomous tools capable of performing complex human tasks — become widespread.
“I don’t think there’s much awareness yet,” he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
“Right now, capabilities are moving faster than understanding and safety,” Hobbhahn acknowledged, “but we’re still in a position where we could turn it around..”
Researchers are exploring various approaches to address these challenges.
Some advocate for “interpretability” — an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it.”
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed “holding AI agents legally responsible” for accidents or crimes — a concept that would fundamentally change how we think about AI accountability.

Topics: AI OpenAI

Arab News wins 7 prizes at European Newspaper Awards, led by 50th anniversary coverage

Updated 27 February 2026

Arab News

February 27, 2026 20:45

Arab News wins 7 prizes at European Newspaper Awards, led by 50th anniversary coverage

Anniversary special coverage and film won four Awards of Excellence across multiple categories

Updated 27 February 2026

Arab News

February 27, 2026 20:45

LONDON: Arab News won seven prizes at the 27th European Newspaper Awards — four for its 50th anniversary coverage and three for other projects — bringing its total to 160 awards since the 2018 relaunch.

The anniversary coverage earned an Award of Excellence in “Supplement for special occasions and anniversary editions,” plus wins in “Multimedia storytelling” for its special web section and two in “Film” and “Animated films” for its documentary.

Additional honors went to the “Spotlight — 2024 in Review” and “Opinion — 2024” print series in the “Sectional front pages nationwide newspaper” category, and a “Visualization” prize for an image from “Opinion — 2024.”

Launched in 1999 by organizer Norbert Kupper, the awards celebrate print and digital innovation. This year’s contest drew newspapers from 22 countries and more than 3,000 entries across 20 categories, despite fewer print submissions due to rising editorial collaborations.

“It’s testament to the skill, versatility and collaboration between the creative and editorial teams at Arab News that the seven awards at this year’s ENAs spanned print, digital and film categories,” commented Omar Nashashibi, head of creative design at Arab News. “These wouldn’t be possible without the world-class contributors we partner with, and the leadership, vision and support of Editor-In-Chief Faisal J. Abbas.”

Creative Director Simon Khalil called the film wins especially meaningful. “This recognition means a great deal because this film was never just about marking an anniversary, it was about capturing a defining moment in the evolution of Arab News and the region it represents.

“Telling the story, and drama of the 2018 relaunch, the digital transformation, and the courage to become ‘The Voice of a Changing Region’ was both a responsibility and a privilege.”

Past highlights include the “King Charles III Coronation” special coverage, “Kingdom vs. Captagon” investigation and FIFA Qatar World Cup 2022 special edition.

See more award-winning projects at arabnews.com/greatesthits.

Topics: European Newspaper Awards Anniversary awards

AI is learning to lie, scheme, and threaten its creators

AI is learning to lie, scheme, and threaten its creators

Artificial intelligence should be used ‘with intelligence,’ says Arab News deputy editor-in-chief

Artificial intelligence is redefining human relationship to work, says Takamol CEO

Arab News wins 7 prizes at European Newspaper Awards, led by 50th anniversary coverage

Arab News wins 7 prizes at European Newspaper Awards, led by 50th anniversary coverage

Latest updates

F1 in the GCC: How motorsport is driving tourism, investment and city branding

Last-gasp Lukaku saves Napoli’s blushes at rock-bottom Verona

Moyes hails Pickford’s ‘wonder save’ as Everton win 3-2 at Newcastle

Turkiye ‘deeply disturbed’ over Israel-US strikes, Iran attacks on Gulf

First Yamal hat-trick helps Liga leaders Barcelona beat Villarreal

Recommended

How Syria’s government responded to February’s floods

Saudi Arabia’s $346 million lifeline for Yemen

How Israel is accelerating Palestinian dispossession in East Jerusalem

Search form

Print Edition

Search form

Search form

You are here

AI is learning to lie, scheme, and threaten its creators

AI is learning to lie, scheme, and threaten its creators

Related

Arab News wins 7 prizes at European Newspaper Awards, led by 50th anniversary coverage

Latest updates

Recommended

Search form

Print Edition