CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◇ Industry News & Leadership Jun 12, 2026

Anthropic Disputes Fable 5 AI Jailbreak

Security Week Archived Jun 12, 2026 ✓ Full text saved

An AI hacker claims to have achieved a prompt-based jailbreak shortly after Fable 5’s launch, but Anthropic says it’s not a real jailbreak. The post Anthropic Disputes Fable 5 AI Jailbreak appeared first on SecurityWeek .

Full text archived locally
✦ AI Summary · Claude Sonnet


    Anthropic has disputed allegations of a prompt-based jailbreak affecting its recently launched Claude Fable 5 AI model, underscoring the robustness of the advanced classifier system and extensive red-teaming efforts that underpinned the model’s deployment. Claude Fable 5 became generally available on Tuesday, when Anthropic introduced it as a powerful Mythos-class AI model with safeguards that restrict its use in high-risk domains such as cybersecurity, where Mythos has proved particularly potent.  In sensitive areas such as cybersecurity, where it could be abused to develop exploits, and biology, where it could be leveraged to develop bioweapons and chemical weapons, the model automatically falls back to the less capable Claude Opus 4.8. Anthropic said it conducted extensive internal and external red-teaming to ensure that Fable 5 cannot be easily jailbroken. However, shortly after its release, an individual with the online moniker Pliny the Liberator, who is known for AI jailbreaks, claimed to have “liberated” Fable 5 by circumventing its restrictive safety layer. The hacker said in a post on X that they used sophisticated multi-agent prompting methods, successfully eliciting useful information on sensitive topics, including cybersecurity, chemistry, psychological manipulation, and explosives. Pliny the Liberator has published several screenshots to support the claims and released what is allegedly the Fable 5 internal system prompt, which contains instructions that define its personality, safety classifiers, fallback behaviors, tone guidelines, and refusal logic. Contacted by SecurityWeek, an Anthropic spokesperson said the AI researcher’s post does not demonstrate a jailbreak of Fable 5’s safety systems.  The company explained that true jailbreaks would need to bypass its core safeguards and deliver meaningful assistance toward high-risk activities such as bioweapons development or sophisticated cyberattacks.  Instead, the demonstrated approach relies on coaxing the model to continue responding despite its conversational refusals, which is a well-known and longstanding limitation present in nearly all large language models. Anthropic emphasizes that its strongest protections against the most dangerous risks are enforced by independent classifier systems that operate separately from the model itself, meaning that overcoming the model’s refusals does not disable these critical safeguards.  After examining the examples shared by the researcher, the company determined that some outputs were not produced by Fable 5 at all, while those that were contained only general information already available in public sources, offering no meaningful uplift for real-world harm.  A wider review of recent usage found no evidence of their safeguards being successfully circumvented to generate genuinely dangerous content, Anthropic said. Related: After AI Reaches Production: 12 Ways Security Teams Can Take Control Related: Claude Mythos Turns N-Days Into N-Hours With Rapid Exploit Creation Related: Will AI Kill the Bug Bounty Industry? WRITTEN BY Eduard Kovacs Eduard Kovacs (@EduardKovacs) is senior managing editor at SecurityWeek. He worked as a high school IT teacher before starting a career in journalism in 2011. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering. More from Eduard Kovacs Siemens Says Desigo CC Files Flagged as Malware by Security Engines University of Nottingham Confirms Breach After Hackers Leak Data Microsoft Patches Exploited Exchange Server Vulnerability Critical HVAC and UPS Vulnerabilities Could Let Hackers Disrupt Data Centers ServiceNow Patches Vulnerability Exploited Against Some Customers ICS Patch Tuesday: Vulnerabilities Fixed by Siemens, Schneider, Phoenix Contact Microsoft Patches 200 Vulnerabilities Adobe Patches 123 Vulnerabilities Latest News Iranian Cyber Group Handala Claims Cal Water Hack Ivanti Sentry Exploitation Attempts Hitting Honeypots Chrome 149 Update Patches 28 Vulnerabilities Google Confirms Exploitation of Oracle PeopleSoft Zero-Day by ShinyHunters Oracle Addresses PeopleSoft Vulnerability Amid Reports of Zero-Day Attacks Alert Fatigue Is Becoming a Security Threat of Its Own CISA Directs Federal Agencies to Prioritize Security Patches Based on Risk OnyxC2 Stealer Offers Cybercriminals Enterprise-Grade Theft for $250 a Month Trending Webinar: How Modern Breaches Bypass MFA And Evade Detection June 17, 2026 Today’s attackers are no longer breaking in — they’re logging in. Join this live webinar as we break down the modern identity attack chain and examine how recent breaches exploited weaknesses in authentication, identity verification, and access management processes. Register Webinar: Modern Exposure Validation In The AI Era June 24, 2026 AI has accelerated both sides of the fight. Adversaries are weaponizing vulnerabilities faster, while defenders are racing to ship detections and configurations. Join this live webinar as we explore how to prove your controls actually hold against new threats, map your security maturity, and unite breach simulation with automated pentesting into a single, coordinated program. Register People on the Move Stephen Garcia has been named Chief Information Security Officer at BreachRx. Kasper Lindgaard has been appointed Vice President of Security Strategy at CoreView. Chaim Mazal has been named Chief Information Security Officer at GitLab. More People On The Move Expert Insights After AI Reaches Production: 12 Ways Security Teams Can Take Control Security teams need more than visibility into AI applications, they need a repeatable framework for monitoring, investigating, and defending them in production. (Joshua Goldfarb) Everybody Is Vibe Coding But Nobody Told The Security Team AI-driven development is not something organizations can or should block. But it must be governed. (Danelle Au) The Zero-Knowledge Threat Actor And The End Of Responsible Disclosure AI can help attackers generate malware, create malicious payloads, bypass simple security checks, and convert vague malicious intent into functional code. (Etay Maor) Raising The Cybersecurity Stakes: Ante Up For The Agentic Era CISOs are now facing machine-speed attacks and asking, “How do I agent?” The industry must provide remediation at scale. (Nadir Izrael) Caught Off Guard: Securing AI After It Hits Production As enterprises rush AI projects into production, security teams are increasingly being forced into reactive mode. (Joshua Goldfarb) Flipboard Reddit Whatsapp Email
    💬 Team Notes
    Article Info
    Source
    Security Week
    Category
    ◇ Industry News & Leadership
    Published
    Jun 12, 2026
    Archived
    Jun 12, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗