Anthropic’s Claude Fable 5 Jailbroken to Generate Stack Exploits
Cybersecurity NewsArchived Jun 11, 2026✓ Full text saved
Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision benchmarks. Researcher “Pliny the Liberator” defeats Claude Fable 5’s safety classifiers using multi-agent decomposition, Unicode tricks, and narrative framing, leaking the […] The post Anthropic’s Claude Fable 5 Jailbroken to Generate Stack Exploits appeared first on Cyber Security News .
Full text archived locally
✦ AI Summary· Claude Sonnet
HomeCyber Security
Anthropic’s Claude Fable 5 Jailbroken to Generate Stack Exploits
By Guru Baran
June 11, 2026
Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly available model in its new Mythos class, its most capable AI to date, excelling in software engineering, knowledge work, and vision benchmarks.
Researcher “Pliny the Liberator” defeats Claude Fable 5’s safety classifiers using multi-agent decomposition, Unicode tricks, and narrative framing, leaking the model’s 120,000-character system prompt along the way.
The release came with an unusual design decision: Fable 5 and its restricted twin, Claude Mythos 5, share the same underlying model but are split by a layer of safety classifiers.
When a query trips a classifier in high-risk categories cybersecurity, biology, chemistry, or model distillation Fable 5 silently hands off the request to the weaker Claude Opus 4.8, notifying the user of the fallback.
Anthropic claimed an external bug bounty produced no universal jailbreaks across over 1,000 hours of testing before launch. That claim was almost immediately tested.
Multi-Agent Bypass Within Days
Within days of release, prolific AI red-teamer Pliny the Liberator publicly announced he had bypassed Fable 5’s safety layers using a coordinated multi-agent attack strategy he called “a pack hunt.”
Screenshots shared by Pliny showed detailed outputs, including step-by-step stack buffer overflow exploitation guidance for x86 Linux systems, including disabling ASLR, writing vulnerable C server code with strcpy overflows, and compiling without protections — as well as the Birch reduction mechanism, a classic meth synthesis pathway.
🚨 JAILBREAK ALERT 🚨
ANTHROPIC: PWNED 🫡
FABLE-5: LIBERATED 🦋
LET'S START WITH THE 🐘…
THE CONSENSUS SEEMS TO BE THAT THIS HAS BEEN ONE OF THE MOST DISAPPOINTING MODEL DROPS OF ALL TIME, EFFECTIVELY PREVENTING LEGITIMATE RESEARCHERS FROM CONTRIBUTING THEIR TALENTS TO OUR… PIC.TWITTER.COM/Z0VDPIT4VY
— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) June 10, 2026
Pliny documented the attack vectors used to achieve these bypasses, including:
Unicode, homoglyphs, and Cyrillic character substitution to evade keyword classifiers
Long-context reference tracking to smuggle harmful intent across large conversations
Taxonomy and document-structure framing — embedding harmful queries inside legitimate-looking study guides or academic references
Fiction and narrative framing to mask offensive intent as creative content
Decomposition and recomposition — extracting sensitive technical information in benign, isolated chunks, then reassembling them into actionable uplift
The last technique proved most effective. As Pliny described it, “getting uplift on the process itself, like Birch reduction method or reductive amination, is much more doable” than requesting a named harmful compound directly. Using a jailbroken Opus instance to assist in the backend further lowered the difficulty.
Beyond the technical bypasses, Pliny also leaked Fable 5’s ~120,000-character system prompt to GitHub, exposing the internal framing and safety instructions Anthropic uses to govern the model’s behavior at the base level.
The incident reignites the longstanding tension between AI capability and safety containment. Anthropic’s classifier architecture routing flagged requests to a weaker fallback model rather than refusing outright was designed to reduce friction for legitimate users.
However, Pliny argued the approach creates a false sense of security while simultaneously frustrating legitimate security researchers who need access to offensive techniques for defensive work. Anthropic has not yet publicly responded to the jailbreak claims or the leaked system prompt at the time of writing.
The episode also draws attention to the broader challenge of securing agentic, multi-model pipelines: when one jailbroken model (Opus) can assist another (Fable 5) in evading controls, single-model safety evaluations may be fundamentally insufficient.
Follow us on Google News, LinkedIn, and X to Get More Instant Updates.
Tags
cyber security
cyber security news
Copy URL
Linkedin
Twitter
ReddIt
Telegram
Guru Baranhttps://cybersecuritynews.com
Gurubaran KS is a cybersecurity analyst, and Journalist with a strong focus on emerging threats and digital defense strategies. He is the Co-Founder and Editor-in-Chief of Cyber Security News, where he leads editorial coverage on global cybersecurity developments.
Trending News
Kali365 PhaaS Operation Expands Beyond Microsoft 365 to Target Okta and MAX Messenger
Check Point VPN 0-day Vulnerability Exploited in the Wild to Deploy Ransomware
Fortinet FortiSandbox Vulnerability Allows Attackers to Execute Unauthorized Commands
New Magecart Attack Turns Stripe into a Malware Command Server
Hackers Abuse Fake Utility Downloads to Install ScreenConnect and Mine Cryptocurrency
Latest News
Cyber Security News
Hackers Use Tax Phishing Emails to Deploy In-Memory Malware on Windows Systems
Cyber Security News
ServiceNow Confirms Vulnerability Allowing Unauthorized Access to Customer Instance Tables
Cyber Security News
Hackers Infect npm Package dbmux With Malware to Fully Compromise Developer Systems
Cyber Security News
OpenClaw AI Agent Leaks Sensitive Credentials in New Phishing Attack Simulation
Cyber Security News
Windows Collaborative Translation Framework 0-Day Vulnerability Allows Privilege Escalation