
The Metropolitan Transportation Authority is looking for a vendor to build an AI system that can detect when a person, animal, or object enters the subway tracks before a train arrives. Before that system goes anywhere near the full network, the MTA plans to test it at exactly two stations, one underground and one elevated, for two years. The estimated cost of that pilot alone: $10 million to $50 million.
Read that again. Two stations. Two years. Tens of millions of dollars. And that is just to find out whether the system is good enough to expand.
If an agency running a 100-year-old subway system is willing to spend that much time and money proving an AI system works before trusting it with people's lives, what does it say about the rest of us who deploy AI into production after a two week trial and a confident vendor demo?
This is not AI for AI's sake. Track intrusions caused roughly 6% of all subway delays last year, and the MTA logged 1,297 unauthorized track entries, a 22% jump from 1,062 in 2019. Those incidents range from someone reaching for a dropped phone to far more serious situations. The agency has tried to solve this before. Between 2014 and 2019, it tested CCTV cameras paired with lasers and video analytics, laser scanners with visual and infrared verification, thermal cameras, and microwave scanners. None of them made the cut.
Jamie Torres-Springer, the MTA official overseeing the effort, summed up why in plain terms: the technology "didn't work to do it in a precise enough way that we could manage how we respond to it." That sentence should be printed and taped to the wall of every organization currently rushing an AI tool into production. Precision is not nice to have. It is the entire point.
Here is what makes this story relevant well beyond transit. Vancouver's SkyTrain system already runs a similar detection setup, and according to reporting on the MTA's plans, it is sometimes triggered by birds or debris, which then requires a worker to physically inspect the track before service resumes. Celeste Kirkland, a union safety director quoted in coverage of the MTA's plans, put it bluntly: "We have rats all through the system. Would they want a train to stop mistakenly because a rat jumped... onto the tracks?
Swap "rat on the tracks" for "false fraud alert," "false malware detection," or "chatbot that escalates a routine billing question to a human at 2am," and you are looking at the exact same failure mode that shows up in enterprise AI every week. A system that cries wolves does not just waste time. It trains the humans around it to stop trusting it, and once that trust is gone, you have spent your AI budget building something people quietly route around. Shadow AI gets all the attention as an "uncontrolled tool" problem. This is its mirror image: a sanctioned tool nobody believes in anymore.
It would be easy to read this story as "government moves slowly, what's new." I would push back on that read. A two year, two station pilot with a defined budget range and a built-in evaluation period is not bureaucratic foot-dragging. It is exactly the lifecycle discipline NIST CSF 2.0 asks organizations to apply to AI: validate in a controlled environment, observe how the system performs against real conditions, and only then make the call to expand it.
Most enterprises skip straight from "the vendor's demo looked great" to "we are live in three departments." There is no instrumented pilot. There is no predefined threshold for what counts as success or failure. There is no plan for what happens if the false positive rate is too high to be useful. The MTA is building all three of those things into its plan before a single sensor goes up. That is pre-production validation done right, and it scales organizations far smaller than a transit authority serving millions of riders a day.
This also matters more, not less, for organizations in regulated environments. Financial services, healthcare, and government contractors do not get to explain a bad rollout as "we were moving fast." Examiners and auditors will ask the same question New York is answering in advance: how did you know this was ready before you trusted it?
You do not need New York's budget to borrow New York's discipline. You need a structure that forces the same questions to get answered before going live, not after.
New York is not testing AI for two years because it does not trust technology. It is testing AI for two years because it understands what is actually at stake when a system that watches subway tracks gets it wrong, in either direction. That is the same math every enterprise leader needs to run before putting AI in front of customers, employees, or critical processes. The question is not whether you can afford to pilot properly. It is whether you can afford not to.
If your organization is moving faster on AI deployment than what you're testing and validation process can keep up with, that gap is where the expensive surprises live. The SamurAI helps organizations build pre-production validation programs grounded in NIST CSF 2.0, so AI gets proven before it gets trusted with the things that matter.

Organizations across New York are investing heavily in artificial intelligence. Financial institutio...

The Question That Ends Meetings A CISO at a mid-size financial services firm in New Jersey walked in...

The Accounts Nobody Is Watching Most organizations have a mature process for managing human identiti...