U.S. Government Gains Early Look at Frontier AI Models From Google, Microsoft and xAI

Tech executives and federal officials struck new agreements this week. Google DeepMind, Microsoft and xAI will hand over unreleased versions of their most powerful AI systems to government evaluators before those models reach the public. The deals expand a quiet but growing practice of pre-deployment testing. They arrive as alarms over advanced cyber capabilities intensify and the Trump administration weighs its role in AI oversight.

CAISI, the Center for AI Standards and Innovation housed inside the Commerce Department’s National Institute of Standards and Technology, announced the partnerships on May 5. The center has already completed more than 40 evaluations of frontier models. Some of those tests involved systems that have not yet launched. (NIST)

Voluntary Handover of Raw Models

Developers frequently supply CAISI with models stripped of their usual safety guardrails. Evaluators probe for risks tied to biosecurity, chemical threats and attacks on digital infrastructure. Feedback flows back through the TRAINS Taskforce, an interagency group focused on AI and national security. Testing can occur in classified settings. The agreements carry enough flexibility to match the speed of new model releases.

“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications,” said CAISI Director Chris Fall. “These expanded industry collaborations help us scale our work in the public interest at a critical moment.” (NIST)

The pacts build on earlier arrangements with OpenAI and Anthropic, renegotiated to fit current priorities under Commerce Secretary Howard Lutnick and the administration’s AI Action Plan. OpenAI and Anthropic first signed on in 2024 with CAISI’s predecessor. Now five leading labs participate. The voluntary framework gives officials an early window into capabilities. It stops short of any legal power to block releases.

Yet the timing carries weight. Just days earlier, reports surfaced that the White House is considering an executive order to formalize reviews of models with major cyber potential. The New York Times detailed internal discussions about a working group of industry leaders and officials. The goal: get ahead of risks without slowing American innovation against global competitors. (The New York Times)

Anthropic’s recent Mythos model sharpened those worries. The system can spot vulnerabilities in every major operating system and browser. Companies and agencies tested it in limited settings. Pentagon officials grew concerned enough to label related technology a supply-chain risk in one contract dispute. A federal appeals court declined to pause related restrictions. The episode fueled broader conversations about pre-release scrutiny. (WebProNews)

Executives at the companies frame the agreements as responsible steps. Microsoft’s Natasha Crampton highlighted parallel work with the U.K.’s AI Security Institute. xAI’s participation stands out given Elon Musk’s past skepticism of heavy regulation. The move aligns with other pacts that let SpaceX, OpenAI, Google and others supply AI tools for classified government use. Pentagon contracts tied to these efforts already run into hundreds of millions of dollars.

But not everyone feels included. Senior cybersecurity officials from more than a dozen states sent a letter Tuesday to leaders at OpenAI, Anthropic, Microsoft and Google. They argue that frontier model testing currently favors Washington and big technology firms. States, they say, risk falling behind in defending their own critical systems. The letter urges developers to extend early access to state-level experts as threats evolve quickly. (The Wall Street Journal)

This federal-state tension reveals a larger truth. Frontier AI now sits at the intersection of commercial power, national security and public infrastructure. A single model release can reshape vulnerability landscapes across banks, utilities and government networks. Independent testing offers one buffer. Companies still control final decisions on when and how widely to deploy.

CAISI operates with a staff of fewer than 200 people. It lacks statutory authority to mandate cooperation. Its influence rests on relationships, technical expertise and the shared interest in avoiding major incidents. So far the approach has produced voluntary product improvements and better government awareness of international competition, particularly with China.

Industry insiders describe the current setup as the new normal for the handful of labs producing frontier systems. Raw model access lets evaluators test behaviors that disappear once guardrails go live. Results inform both government preparedness and company fixes. Post-release monitoring continues the loop.

The Trump administration began its term signaling a light touch. “We have to grow that baby and let that baby thrive,” the president said of AI early on. Recent developments show a pragmatic adjustment. Officials still want American leadership. They also want visibility into risks that could produce headlines no administration wants.

Over 40 evaluations already completed. More models arrive monthly. The pace leaves little room for delay. Agreements drafted with flexibility reflect that reality. They allow rapid response as capabilities advance.

Critics point to the voluntary nature. A determined lab could withhold key details or time releases to limit review windows. Supporters counter that reputation, customer trust and potential future regulation create strong incentives for cooperation. The recent state letter suggests pressure may grow for broader access beyond federal channels.

CAISI now serves as the main industry contact point inside government for AI testing and research. That central role streamlines what had been fragmented conversations. Information shared through the TRAINS Taskforce reaches evaluators across agencies. Feedback returns to developers.

The deals mark another data point in the steady institutionalization of frontier AI oversight. Not through sweeping new laws. Through targeted, technical partnerships that give officials an early seat at the table. Whether that seat proves sufficient depends on the questions evaluators ask, the answers companies provide and the speed at which threats materialize once models ship.

One thing looks clear. The era of completely blind releases for the most powerful AI systems has ended. Government gets a preview. Companies get structured feedback. The public gets whatever emerges after both sides finish their work.

Notice an error?

Help us improve our content by reporting any issues you find.