Anthropic's Fable AI faces backlash over cybersecurity restrictions
Auf einen Blick
Anthropic's new Fable AI model, a public version of its cybersecurity tool Mythos, is facing criticism from researchers for overly strict guardrails that block even benign requests related to cybersecurity or biology.
KI-generierte Zusammenfassung
Warum es wichtig ist
Anthropic released Fable, a public version of its cybersecurity AI model Mythos, with strict guardrails. These restrictions have drawn criticism from cybersecurity professionals who find them overly broad and disruptive to legitimate work.
Anthropic released its latest model Fable on Tuesday, billing it as a public and limited version of its powerful and much-hyped cybersecurity model Mythos.
But not everyone is happy with the restrictions, and a number of cybersecurity researchers and professionals have aired complaints online.
â[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post,â said Valentina âChompieâ Palmiotti, a well-known security researcher who works at IBM X-Force.
When a prompt triggers its guardrails, Fable pauses the chat and says that its âsafety measures flagged this message for cybersecurity or biology topics.â
The guardrails were put in place to limit the risk that Fable could be used to develop malware or compromise software â a long-standing concern within Anthropic. The restrictions on biology come from a similar concern around developing biological weapons.
When the AI giant released Mythos in April, it restricted the model to a limited number of companies and organizations in what it called Project Glasswing, an effort to deploy the model to secure critical software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries.
But despite the good intentions, many cybersecurity experts are still put off by the haphazard nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch that âif you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.â Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. âIt seems to be keyword based, so anything in the lexical field of âcybersecurityâ triggers the guardrails.â
âBut it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,â said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. âItâs better to catch more people than not enough when you do such a release and to relax the guardrails over time.â
Another researcher griped on X that âeven asking for a code reviewâ triggers Fableâs guardrails.
Anthropic did not immediately respond to a request for comment.
Offene Fragen
- Will Anthropic adjust Fable's guardrails based on the feedback?
- What specific keywords or patterns trigger the guardrails?
- How many organizations have reported issues with Fable's restrictions?
- What is Anthropic's timeline for further evolving Fable's safety measures?






