Full recording of 2m2x Ep. 144, Why Most AI Agents Will Fail
Everyone is building AI agents, but most won’t make it to production. From security risks to lack of oversight, the failure patterns are already emerging. In this episode, we break down five essential best practices, from guardrails to observability, that separate scalable, high-quality agents from costly mistakes.
The rush to deploy AI agents is accelerating faster than the discipline to deploy them responsibly. Organizations across every sector are standing up autonomous systems, agents that browse, decide, call APIs, and act — with the same trial-and-error energy that characterized early mobile app development. The difference is the stakes. A poorly built mobile app gets one-star reviews. A poorly built AI agent leaks sensitive data, acts on hallucinated inputs, or executes decisions that cannot be undone.
Gartner projects that over 40% of agentic AI projects will be scrapped by 2027. That number is not a forecast of technological failure — it is a forecast of organizational failure. The agents that survive will not necessarily be the most capable. They will be the most carefully governed.
The Asymmetry That Changes Everything
The foundational difference between a conversational AI and an agentic AI is consequential action. A chatbot that produces a wrong answer stops there. An agent that receives wrong input keeps moving, calling tools, writing records, sending messages, escalating workflows. The error compounds with every step.
This asymmetry means that the failure modes of agentic AI are categorically different from those of simpler AI systems. OWASP’s GenAI Top 10 for 2025 catalogs exactly where agents break: prompt injection, tool misuse, memory leakage. These are not theoretical vulnerabilities. They are patterns already observed in production deployments, and any team building agents without awareness of them is accepting risk they have not quantified.
“The agents that survive won’t be the most capable. They’ll be the most carefully governed.”
Trust Is Earned Incrementally, Not Assumed
The most effective agentic deployments start narrow. A single workflow. A constrained decision space. A clear definition of what the agent is authorized to do and what requires escalation. Snowflake’s approach to agent adoption articulates the principle clearly: expanding capabilities after trust is established is far easier than recovering adoption after a public failure.
Starting narrow is not a limitation, it is a strategic choice. It creates the evidence base needed to justify expansion. It surfaces edge cases in a controlled environment. And it builds the organizational confidence required for agents to take on more consequential work over time.
Observability and Versioning Are Not Optional
Two practices separate mature agentic programs from fragile ones: observability and versioning. Observability means having a complete audit trail of every decision an agent made, every API it called, and every action it took. Without it, debugging is guesswork and regulatory compliance is impossible. McKinsey’s research found that only 17% of enterprises have a formal governance structure around AI, a gap that represents both enormous risk and competitive opportunity for those who close it first.
Versioning applies not just to code but to the full stack of agent behavior: prompts, tools, and evaluation datasets. Production releases should be gated behind evaluation thresholds. Without this discipline, regressions are invisible until they cause incidents, and rollbacks become reconstructions rather than reversions.
The Human Checkpoint
No discussion of agentic AI governance is complete without addressing the question of autonomy limits. The drive to remove humans from workflows is understandable, it is, in many cases, the entire point. But unchecked autonomy on high-stakes decisions is not a feature; it is a liability.
Any agent operating in sensitive domains, finance, HR, legal, healthcare, customer data, requires defined human checkpoints. Not because AI judgment is inherently untrustworthy, but because accountability requires a human in the chain. Non-auditable agents that take consequential actions without any human review should not exist in production systems. The goal is not to slow down automation; it is to make it defensible.
Organizations that apply these principles, narrow scope, mandatory guard rails, full observability, rigorous versioning, and human oversight on high-stakes decisions, are the ones that will be expanding their agentic programs in 2027 while others are explaining why theirs were scrapped.
Informulate helps organizations build AI agents that are production-ready and governance-complete. Reach out to start the conversation.


