Newsgather
BackAnthropic Releases Claude Fable 5 with Safeguards on Sensitive Topics
Anthropic Releases Claude Fable 5 with Safeguards on Sensitive Topics
En développement
Ars Technica5 g önceTech3 dk okumaUnited States

Anthropic Releases Claude Fable 5 with Safeguards on Sensitive Topics

L'essentiel

  • Anthropic launched Claude Fable 5, a new AI model with enhanced capabilities but restricted access to sensitive topics like cybersecurity and biology.
  • The company aims to prevent misuse by malicious actors, though this may limit harmless requests.

Résumé généré par IA

Pourquoi c'est important

Anthropic has released Claude Fable 5, a new AI model that claims to surpass its previous Opus models. The launch includes significant safeguards to prevent the model from answering queries on sensitive topics like cybersecurity, biology, and chemistry, due to concerns about potential misuse by malicious actors.

Taille de police

Anthropic Tuesday publicly released Claude Fable 5, its first “Mythos-class” model that it says surpasses its previous frontier Opus models in overall capabilities. But the model’s launch today comes with safeguards designed to prevent it from answering queries on topics like cybersecurity, biology, and chemistry, where the company has publicly worried about its potential impact to “uplift” malicious actors.

Anthropic says Fable 5 operates on the “same underlying model” as Mythos 5, which is coming out of its monthslong “Mythos Preview” period today, but only for “a small group of cyberdefenders” judged trustworthy through the existing Project Glasswing. Unlike Mythos 5, though, the publicly accessible Fable 5 is designed to funnel queries on certain sensitive topics to the earlier Claude Opus 4.8 model and to warn the user when this is happening.

Anthropic said it has tuned these safeguards to be “stricter than ideal,” meaning the system may occasionally refuse “harmless requests” in a way that it acknowledges may be frustrating for regular users. But Anthropic says such false positives come up in less than five percent of all sessions in testing, and were worth it to avoid situations where Mythos could give malicious actors assistance in “causing serious harm that they couldn’t have received from other sources.”

I can’t let you do that, Dave

Fable 5’s topic-based safeguards are built around a system of classifiers designed to broadly detect banned prompt subjects as well as any potential jailbreak attempts. In over 1,000 hours of red-team testing with a bug bounty program, Anthropic says external teams failed to find any universal jailbreaks for Fable 5. The new model also resisted automated jailbreak attempts to a much larger degree than previous Claude Opus models, Anthropic said.

The company said it is particularly worried about Mythos 5’s ability to perform “agentic hacking,” executing multi-part cyberattacks with much more facility than earlier models. But testing from the UK’s AI Security Institute in recent months found that Mythos Preview performed similarly to OpenAI’s GPT-5.5 on a suite of Capture the Flag challenges, suggesting Mythos’ performance is not “a breakthrough specific to one model.”

Among the usual raft of fair-to-middling benchmark test improvements that Anthropic reports for Mythos 5 over previous frontier models, the company claims a significant jump in the model’s capabilities on the cybersecurity-focused ExploitBench test. Mythos 5 scored a 78 percent on the benchmark’s tests of vulnerable code exploits, a significant increase from the 40 percent score from Opus 4.8, and even the 69 percent score achieved by Mythos Preview.

While earlier Anthropic models blocked bioweapons-related queries, that classifier now applies to all chemistry and biology-related queries in Fable 5. The company says it worries that “well-resourced malicious actors” could use even seemingly benign queries on these subjects to assist with “highly risky biological research” in a much more effective way than with previous models.

Who can you trust?

Anthropic seems to understand that making certain topics off-limits for Fable 5 is something of a double-edged sword. The company writes that “the same queries that are beneficial in the hands of cybersecurity professionals and biology researchers could be dangerous if available to malicious actors.”

That puts Anthropic in the somewhat awkward position of having to judge who is and is not trustworthy enough to have access to a model that it says has potentially dangerous capabilities. The company says it will be periodically expanding its existing Project Glasswing program “in consultation with the US government” to let in more cybersecurity professionals. That expansion will also include a new trusted access program for life sciences organizations that removes Fable 5’s biology/chemistry safeguards while keeping cybersecurity safeguards in place.

API and Enterprise users will be able to access the Fable 5 model at a cost of $10-per-million input tokens and $50-per-million output tokens starting today. Those prices are 67 to 100 percent higher than those for OpenAI’s recent GPT-5.5, a difference that could be significant at a time when many users are balking at the high cost of frontier models.

Anthropic’s existing subscription plans will include access to Fable 5 through June 22, after which users will need to purchase “usage credits” to access the new model. Anthropic says it eventually hopes to restore Fable 5 access as a standard part of subscription plans once it has “sufficient capacity” to do so.

À surveiller

Perspective IA — des possibilités, pas des certitudes

  • Anthropic will expand its Project Glasswing program to include more cybersecurity professionals.

    Très probable · En quelques mois

  • A new trusted access program for life sciences organizations will be implemented.

    Très probable · En quelques mois

  • Anthropic will eventually restore Fable 5 access as a standard part of subscription plans.

    Possible · Long terme

Questions ouvertes

  • What specific criteria are used to classify 'malicious actors'?
  • How will the 'trusted access program for life sciences organizations' be implemented and monitored?
  • What are the long-term implications of restricting access to certain AI capabilities for research and development?
  • Will the pricing structure for Fable 5 remain competitive in the long run?

Sujets liés

This article was originally published by Ars Technica.

Articles liés

Sonos Play Review: A Versatile Speaker for Desk and Patio
Tech·18 sa önce

Sonos Play Review: A Versatile Speaker for Desk and Patio

The new Sonos Play speaker offers a hybrid design, functioning as both a desk speaker and a portable device. It features a pill-shaped dock, a utility loop for carrying, and IP67 water resistance. While praised for balanced sound and convenient controls, it has minor app issues and a narrow soundstage at higher volumes. It's ideal for desk or patio use, with options like stereo pairing and automatic Trueplay calibration.

TechCrunch
Plus sur ce sujetAI