Why GenAI pilots fail security approval

There is a meeting I have sat in many times now, on both sides of the table. A project team presents a GenAI pilot to a security or architecture review board. The demo is genuinely good. Then someone from the assurance side asks where the prompts go, and the room goes quiet while the team works out who is supposed to know.

That silence is the whole problem in miniature. The pilot was built to prove the idea works. The review exists to establish whether the design is safe to run. These are different questions, and a demo answers only the first one.

How pilots get built

It is worth being honest about how most pilots actually come into existence, because the approval failure is baked in at the start. A motivated team gets access to an API key or a sandbox subscription. Speed matters more than placement, so the pilot lands wherever provisioning was easiest — sometimes a personal dev tenancy, sometimes a corner of a shared environment with permissions nobody scrutinised. Test data gets used because it was there. A third-party orchestration framework comes in because a tutorial used it. None of this is documented, because documenting things slows pilots down, and the entire point of a pilot is speed.

All of that is fine — genuinely fine — right up until the business decides it wants the thing in production. At which point the organisation discovers it has a working prototype and no reviewable design.

What the approval forum is actually asking for

Security reviewers get cast as the villains of this story, but in my experience their requirements are reasonable and remarkably consistent. Strip away the templates and the forms, and a review board wants to be able to see roughly eight things:

Where data enters and leaves the workload, drawn as an actual diagram. Which identities the system runs as, and what they can touch. Where the model is hosted and under what terms — and if a supplier is involved, what they can see and retain. What gets logged, where the logs live, and who reads them. How users get access and how it is removed. What happens when the model produces something wrong or harmful. What the residual risks are, stated plainly. And who, by name, owns the thing once it is live.

Notice what is not on the list: nothing about transformer architectures, nothing exotic. It is the same set of questions that any production system has to answer. Teams experience it as unreasonable only because the pilot was never designed to answer questions at all.

The reactive spiral

When those answers don't exist, the review becomes archaeology. The board asks a question; the team goes away for two weeks to find out; the answer raises another question. The project sees obstruction. The reviewers see a system whose own builders cannot describe it — which, from where they sit, is the single most alarming property a system can have. The sponsor sees a delivery date receding, and somewhere in month three the phrase "security is blocking us" makes its first appearance in a steering pack.

Nobody in this spiral is behaving badly. The sequencing was wrong, that's all.

The fix is cheap and unfashionable

The remedy is not a heavyweight governance process bolted onto every experiment — that would just kill the experiments. It is one piece of work, done at the moment a pilot is nominated for production: prepare the architecture and assurance position before requesting the review. A data-flow diagram that matches reality. An identity model someone has actually checked. A hosting and supplier statement. A short, candid residual-risk note. In my experience this is one to three weeks of focused effort with the right person driving it, and it transforms the review from an interrogation into a confirmation.

One to three weeks. Set that against the three to six months the reactive spiral costs, and the maths is not subtle.

The pilots that sail through approval are rarely the most technically impressive ones. They are the ones that arrived with a position the reviewers could read, challenge and approve. Approval forums do not reward cleverness; they reward legibility.

← Back to insights