RT Business2d agoTech2 min readRussia

Anthropic to Increase Transparency of AI Model Safeguards

Quick Look

AI giant Anthropic will make its advanced model safeguards more transparent, notifying users when requests are downgraded or rejected.
This follows criticism that undisclosed restrictions could hinder AI research and development.

AI-generated summary

Why It Matters

Anthropic faced criticism for previously routing user requests involving sensitive topics to less capable AI models without user notification. This approach was argued to potentially slow progress in AI research.

Font size

US artificial intelligence giant Anthropic said on Wednesday it would make the safeguards governing its most advanced AI models more transparent, including by disclosing when user requests are downgraded or rejected.

The move follows criticism over restrictions that were previously not visible to users.

Previously, Anthropic could silently route requests involving areas such as cybersecurity, biology, and advanced AI development from its Fable 5 model to the less capable Opus 4.8.

Under the new policy, users will be notified when a request is flagged, while Application Programming Interfaces (API) developers will receive explanations for any rejection or fallback to another model.

The approach of routing some requests related to frontier AI development to a less capable model had drawn criticism from researchers, who argued that the restrictions could slow progress in the field.

Responding to the backlash, Anthropic agreed to make the safeguards visible.

“Starting this week, flagged requests will visibly fall back to Opus 4.8 – the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal,” Anthropic said.

Fable 5 is a publicly released model from Anthropic’s Mythos class, which the company unveiled in April but initially withheld, saying models in the family were too adept at bypassing cybersecurity safeguards and too dangerous for broad deployment.

Anthropic released Fable 5 this week, saying its capabilities “exceed those of every model we’ve previously made generally available.”

In its latest statement, Anthropic said it would continue downgrading some requests under policies banning use of its models to build competing AI systems, adding that such restrictions are standard in the industry and do not affect most coding and machine learning work.

The company also cited national security as a reason for rejecting or downgrading some requests, saying it wanted to prevent foreign adversaries from using its technology to strengthen their AI capabilities.

“The US and its allies hold an edge in frontier chips and the highly optimized software that runs them at full potential,” a company spokesperson told Fortune. “These safeguards ensure Claude [Anthropic’s family of AI models] isn’t used to erode that advantage – by optimizing chips developed by those adversaries, for example.”