Microsoft Plans to Rank AI Models by Safety

Microsoft is planning to rank artificial intelligence (AI) models based on safety.

    Get the Full Story

    Complete the form to unlock this article and enjoy unlimited free access to all PYMNTS content — no additional logins required.

    yesSubscribe to our daily newsletter, PYMNTS Today.

    By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions.

    This effort is part of the tech giant’s effort to foster trust among its cloud customers as it sells them AI products from companies like OpenAI and xAI, the Financial Times (FT) reported Sunday (June 8). 

    Sarah Bird, Microsoft’s head of Responsible AI, told the FT the company would soon add a “safety” category to its “model leaderboard,” a feature it created recently for developers to rank iterations from providers including China’s DeepSeek and France’s Mistral.

    The leaderboard, which is accessible to clients using the Azure Foundry developer platform, is expected to influence which AI models and applications are purchased via Microsoft.

    Microsoft ranks three metrics: quality, cost and throughput, which is how fast a model can generate an output. Bird said the new safety ranking would make sure “people can just directly shop and understand” AI models’ capabilities as they determine which to purchase.

    The decision to offer safety benchmarks comes as Microsoft’s customers wrestle with the potential risks posed by new AI models to data and privacy protections, especially when deployed as autonomous “agents” that can function with no human supervision.

    Rankings let users access objective metrics when choosing from a catalogue of more than 1,900 AI models, allowing them to make an informed choice of which to use.

    “Safety leader boards can help businesses cut through the noise and narrow down options,” Cassie Kozyrkov, a consultant and former chief decision scientist at Google, told the FT.

    “The real challenge is understanding the trade-offs: higher performance at what cost? Lower cost at what risk?”

    In other AI news, PYMNTS recently examined the use of AI agents in banking compliance in an interview with Greenlite AI CEO Will Lawrence.

    He told PYMNTS CEO Karen Webster that while the 2000s were defined by rule-based systems and the 2010s ushered in machine learning, the 2020s are “the agentic era of compliance.”

    The shift, that report said, has a range of implications. Trust is key for regulated financial institutions. Mistakes lead to declined transactions and also risk regulatory exposure.

    “Right now, banks are getting more risk signals than they can investigate,” Lawrence said. “Digital accounts are growing. Backlogs are growing. Detection isn’t the problem anymore — it’s what to do next.”

    “AI is only scary until you understand how it works,” Lawrence added. “Then it’s just a tool — like a calculator. We’re helping banks understand how to use it safely.”