
Here is the part of last week's Anthropic news that matters most for anyone running AI inside their business, and it has nothing to do with export law. According to Anthropic's own account, the US government's directive to disable its Fable 5 and Mythos 5 models followed the discovery of a jailbreak technique, a way of prompting the model that bypassed its built-in safeguards.
The demonstrated version reportedly had the model review a codebase and surface security flaws it was not supposed to help find. Anthropic called the jailbreak "narrow" and pushed back hard on the response. Set that dispute aside for a moment and look at the sequence of events: a safeguard that was supposed to hold, did not. And the thing that actually stopped the model from being used that way was not a control built into the model, the account, or the enterprise deployment. It was an external directive that took the entire system offline, for everyone, worldwide.
That is a controlled shutdown. Just not the kind anyone designed on purpose.
NIST CSF 2.0 treats AI as a full lifecycle risk, not a one-time deployment checkbox. That means continuous monitoring for anomalous behavior, defined detection of thresholds, and the ability to contain or shut down a system in a way that is fast, precise, and proportionate to the actual problem. What happened with Fable 5 and Mythos 5 technically satisfies "shut down a system with a safety problem." It does not satisfy fast, precise, or proportionate.
It took public reporting, a government directive, and the global suspension of two products serving an enormous user base to address what Anthropic itself described as a narrow issue affecting one bypass technique. If that is what it takes to address a problem this specific, the "off switch" was not actually a switch. It was the building's main breaker.
It is tempting to read this as a story about frontier model providers and move on. Your organization is not Anthropic, and you are not the one whose safeguards got jailbroken. But flip the question around: when your organization gives an AI agent access to a codebase, a ticketing system, a customer database, or the ability to take action on your behalf, what is your equivalent of "the safeguard that was supposed to hold"? And if that safeguard fails the way Fable 5's did, quietly, through a prompting technique nobody anticipated, what is your version of the off switch? Who pulls it, how fast, and does pull it actually stop the behavior, or just the conversation that revealed it?
ISACA's 2026 AI Pulse Poll, based on responses from more than 3,400 digital trust professionals across governance, audit, cybersecurity, and privacy roles, found that 56% do not know how quickly they could halt an AI system in the event of a security incident. The same survey found that only 36% say humans approve most AI-generated actions before they execute, with another 26% reviewing decisions only after the fact. Put plainly: most organizations cannot say how fast they could stop an AI system if something went wrong, and most are not reviewing what that system did until after it already happened. That is the gap a jailbreak walks through.
Zero Trust, as most organizations have implemented it, was built around human identity: who is this person, what device are they on, does this request match their normal behavior. AI agents break every assumption in that model. An agent's "normal behavior" can change the moment someone finds a new way to prompt it. Its permissions were likely scoped once, at setup, and rarely revisited since. And because it is not a person, the usual triggers, a login from a new location, an unusual access pattern flagged by HR, do not apply.
This is why agentic AI governance cannot be a pre-deployment checklist that gets signed off once and filed away. An AI agent with access to code, data, or systems needs the same category of ongoing scrutiny you would apply to a highly privileged service account: scoped permissions that default to the minimum needed, behavioral baselines so that a sudden change in what the agent is doing or asking for gets flagged, and, critically, a revocation path that your organization controls directly rather than one that depends on the model provider noticing first.
You will not get a phone call from a federal agency when something goes wrong with an AI agent your team built. You need the equivalent capability internally, scaled to your environment.
The headline from last week was about export controls and a dispute between a company and a government. The lesson underneath it is about something every organization running AI agents needs to answer honestly: if a safeguard you are relying on quietly failed today, would you find out from your own monitoring, or would you find out the way the rest of us found out about Fable 5, after the fact, and from someone else?
If your organization is deploying AI agents with real access to systems and data but has not tested whether you could contain one quickly, that is not a policy gap. It is an incident response gap with an AI-shaped blind spot. The SamurAI helps organizations build runtime oversight and containment capability for AI agents, grounded in NIST CSF 2.0's lifecycle approach to AI risk.

The Metropolitan Transportation Authority is looking for a vendor to build an AI system that can det...

Organizations across New York are investing heavily in artificial intelligence. Financial institutio...

The Question That Ends Meetings A CISO at a mid-size financial services firm in New Jersey walked in...