Son Dakika
ARاعتقال رجل في اسكتلندا على خلفية هجمات معادية للإسلامARإيران تهدد بإغلاق مضيق هرمز وسط توترات مع إسرائيل وأمريكاARنائب مصري يحذر من تحول السيارات لأدوات قتل متنقلة بسبب قيادة الأطفالARتصعيد عسكري عنيف في لبنان: مقتل أكثر من 110 أشخاص وإدانات دوليةARمسيرة جماهيرية ضخمة في أتلانتا تحفيزاً للاعبي السعودية قبل مواجهة إسبانياARأسواق الصرف العالمية وسندات الدخل الثابت تترقب أسبوعاً حاسماً للسياسات النقديةARحمزة عبد الكريم: أصغر لاعبي مصر في كأس العالم ووريث محمد صلاح المحتملARمعرض "خټین زر" في هرات: نساء أفغانيات يحولن التراب إلى ذهب رغم قيود طالبانARإعادة هيكلة الطرق اللوجستية: الولايات المتحدة لاعب رئيسي جديد في إمدادات الطاقة لآسيا عبر القناةARهدوء حذر في جنوب لبنان بعد يومين داميينARاعتقال رجل في اسكتلندا على خلفية هجمات معادية للإسلامARإيران تهدد بإغلاق مضيق هرمز وسط توترات مع إسرائيل وأمريكاARنائب مصري يحذر من تحول السيارات لأدوات قتل متنقلة بسبب قيادة الأطفالARتصعيد عسكري عنيف في لبنان: مقتل أكثر من 110 أشخاص وإدانات دوليةARمسيرة جماهيرية ضخمة في أتلانتا تحفيزاً للاعبي السعودية قبل مواجهة إسبانياARأسواق الصرف العالمية وسندات الدخل الثابت تترقب أسبوعاً حاسماً للسياسات النقديةARحمزة عبد الكريم: أصغر لاعبي مصر في كأس العالم ووريث محمد صلاح المحتملARمعرض "خټین زر" في هرات: نساء أفغانيات يحولن التراب إلى ذهب رغم قيود طالبانARإعادة هيكلة الطرق اللوجستية: الولايات المتحدة لاعب رئيسي جديد في إمدادات الطاقة لآسيا عبر القناةARهدوء حذر في جنوب لبنان بعد يومين داميين
Newsgather
GeriMicrosoft Unveils ASSERT for Application-Specific AI Testing
Microsoft Unveils ASSERT for Application-Specific AI Testing
Teknoloji
TechCrunch02.06.2026Teknoloji2 dk okumaUnited States

Microsoft Unveils ASSERT for Application-Specific AI Testing

Hızlı Bakış

  • Microsoft has launched ASSERT, an open-source framework designed to simplify the testing of AI models for specific product behaviors.
  • It converts natural language descriptions into scored tests, helping developers ensure AI systems adhere to intended functionalities and policies.

Yapay zekâ özeti

Neden Önemli?

AI researchers have made significant progress in evaluating AI models. However, a specific need has emerged for companies to ensure AI systems behave as intended for their particular products or services.

Yazı boyutu

AI researchers and labs have advanced by leaps and bounds in evaluating AI models for everything from safety and compliance to sycophancy and alignment. But it appears companies and developers are faced with a new, specific need: making sure that their AI system behaves as intended for their specific product or service.

In a bid to make that testing process simpler, Microsoft on Tuesday took the wraps off ASSERT, short for Adaptive Spec-driven Scoring for Evaluation and Regression Testing.

The open-source framework, Microsoft says, makes evaluating application-specific AI behavior easy by using AI to turn high-level, natural-language descriptions of goals, policies, or intended behaviors into thorough, scored tests that can be investigated.

ASSERT takes plain-language descriptions of an AI model’s expected behavior and policies, turns them into a structured set of acceptable and unacceptable behaviors, generates problem scenarios and test cases, runs them against the target system, and scores the results. It can also record the paths the AI system takes, including intermediate actions and tool calls, so developers can inspect where failures happen.

Devs can provide system context, tools, and constraints, too, if they want to further customize what the evaluations cover.

For example, a developer could specify that a document research AI agent shouldn’t send emails to people outside the company, limit confidential information to C-level executives, and provide concise summaries with prior context in mind. ASSERT will use those rules to generate test cases that check whether the system follows those rules on an ongoing basis.

The framework, according to Microsoft, fills a gap that broader, more general evaluations cannot when AI models are intended to behave in a manner that is shaped by an application or product’s context, policies, and tools.

“One of the things we’ve learned is that evaluations are absolutely critical to making good decisions,” said Sarah Bird, chief product officer of Responsible AI at Microsoft. “Because if you don’t understand the behavior of the AI system, it’s really hard to know if it’s meeting your organization’s bar […] What we found is that if you really want to have a trustworthy system, you should evaluate many more dimensions that are application-specific.”

Bird said ASSERT can be used to evaluate systems when they’re being built, after deployment, and even for continuous monitoring.

Açık Sorular

  • What are the specific technical requirements for using ASSERT?
  • How does ASSERT compare in performance and scope to existing AI testing frameworks?
  • What is the long-term roadmap for ASSERT's development and integration?
  • Are there any known limitations or edge cases for ASSERT's effectiveness?

İlgili Konular

Bu haber ilk olarak şurada yayınlandı: TechCrunch.

İlgili Haberler

Apple Unveils Numerous App and Service Upgrades at WWDC Beyond Siri
Gelişiyor·17 sa önce

Apple Unveils Numerous App and Service Upgrades at WWDC Beyond Siri

Apple announced significant updates to its core apps and services at WWDC, including enhanced Apple Maps with 'Local Lists' and improved 'Flyover,' more flexible location sharing in Find My, and advanced bill splitting in Apple Wallet powered by Apple Intelligence. Other updates include redesigned Apple Pay checkout, expanded Apple Music features like lyrics translation, new search capabilities in Apple Podcasts, improved iCloud Shared Albums, and a new Fitness+ program for menopause.

TechCrunch
Bu konuda daha fazlamicrosoft