Table of contents
Why Most AI Buyers Don't Know What They're Buying (Yet)
The enterprise AI procurement crisis is not a vendor problem. It is a buyer literacy problem, and it is expensive. Enterprise AI evaluation is failing at scale because the buyers signing the contracts cannot write the spec that would tell them whether the vendor is real.
That is the unpopular framing. Most posts on the topic blame vendors. We are going to flip it.
What the Cancellation Data Actually Says
In July 2024, Gartner published a press release predicting that at least 30% of generative AI projects would be abandoned after proof of concept by the end of 2025. The named reasons were poor data quality, inadequate risk controls, escalating costs, and unclear business value. The most quoted analyst on the piece, Rita Sallam, attributed the failure to "difficulty translating productivity gains into financial benefit."
Read that carefully. The reason was not "the model did not work." The reason was the buyer could not translate the model working into a number their CFO would accept. That is a buyer-side gap.
In June 2025, Gartner went further. The follow-up press release predicted that over 40% of agentic AI projects would be canceled by end of 2027. The same release introduced the term "agent washing," which Gartner defined as "the rebranding of existing products, such as AI assistants, robotic process automation, and chatbots, without substantial new capabilities."
Agent washing is the symptom. Buyer illiteracy is the cause. A buyer who cannot tell the difference between an agentic system and a chatbot wrapper is a buyer who will pay full agentic-system prices for a chatbot wrapper. The market clears at the buyer's confusion. The vendor that is willing to be vague wins the deal.
A January 2025 Gartner poll of 3,412 enterprise practitioners landed in the same release. Only 19% of organizations reported significant agentic AI investment. Forty-two percent were running conservative pilots. Thirty-one percent were wait-and-see or unsure. Eight percent had done nothing at all. Nobody was confidently scaled. This is a market that is hesitating, and the hesitation is rational because the buyers cannot tell what they are looking at.
What Anthropic Told the Market
The most useful read on this is from the vendor side, not the buyer side. Anthropic published a piece in December 2024 called "Building Effective Agents." Buried inside is this line, which the marketing team probably wanted to soften and did not: "Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense. This might mean not building agentic systems at all."
That is the actual frontier-lab guidance. Sometimes you should not buy an agent. Sometimes a clean prompt and a simple workflow get you the same outcome at one tenth the cost. The vendor saying this out loud is the vendor whose product you should be evaluating. The vendor pitching you "fully autonomous, end-to-end, hands-off" without naming the tradeoff is the vendor selling you the chatbot wrapper at agent prices.
The Anthropic piece is a buyer-education document disguised as engineering documentation. It is worth reading in full before your next AI vendor call.
The Klarna Lesson
In February 2024, Klarna issued a press release that the AI procurement crowd is still arguing about. The headline: Klarna's AI assistant had handled 2.3 million customer service conversations in its first month, "two-thirds of Klarna's customer service chats," with output the company described as on par with human agents on customer satisfaction. The release projected $40 million in profit improvement for 2024.
By late 2024, Klarna was hiring humans again, and CEO Sebastian Siemiatkowski was publicly walking back parts of the original narrative. Reporting from Bloomberg and others traced the reversal to quality issues on the harder tier of conversations, where contextual judgment outperformed the deployed AI's pattern matching.
The buyer-education lesson is not that Klarna was wrong to deploy. The deployment worked for the routine tier. The buyer-education lesson is that Klarna's success criteria, as presented to the market, did not separate routine work from judgment work. The press release optimized the narrative for cost saved. The follow-up reality optimized for quality lost. Those are two different KPIs. If the original procurement spec had named both, the rollout would have shipped differently and the public reversal would not have happened.
The buyers who learn from this watch their own spec language. The buyers who do not are repeating it now, in 2026, with agentic systems.
What a Literate AI Buyer Looks Like
The buyer who is going to survive the next two years has four habits worth naming.
They can name the proficiency level they are buying for. They know whether they are funding a tool that helps an employee at L1 (occasional, ad-hoc) get to L2 (regular, workflow-integrated), or one that lets an L3 employee author workflows for the rest of the org. The dollar amounts and the success criteria are different at each level. A buyer who cannot place their workforce on a proficiency scale will buy the wrong tier of product.
They can write the spec. "We need an AI that helps our support team" is not a spec. "We need an AI that handles the routine tier of inquiries (defined as the 60% of tickets resolved without a human escalation in the prior twelve months), measured by deflection rate at week 4 and customer satisfaction delta at week 8, with a manual escalation path for everything else" is a spec. The vendors who can pass this spec are not the vendors who would win the vague version.
They demand the verification layer. What does the dashboard look like. What gets measured. How will the buyer know in week 6 whether the deployment is working. The vendors who shrug at this question are agent-washing. The vendors who hand over an actual telemetry plan are real.
They are willing to walk. The buyer with no walk-away alternative pays the vague vendor's price. The buyer who has a quiet relationship with a competing vendor (or with an in-house option) walks the price down by 40% and walks the spec up by 60%. Walk is the single highest-leverage move in this market.
The Vocabulary Move
The reason buyer literacy compounds is that vocabulary compounds. Once a buyer can place an employee at L0, L1, L2, or L3, they can scope the engagement by level. Once they can scope by level, they can measure by level. Once they can measure by level, they can fire the vendor that is not moving the needle.
That is the loop. The vocabulary creates the measurement. The measurement creates the leverage. The leverage creates the market clearing at honest prices instead of vague ones.
The framework can be the L0-L3 model. It can be a different one. The specific labels matter less than the discipline of using them. The buyer who tells their CFO "we are moving 200 employees from L1 to L2 by quarter end, measured by these three behaviors" is a buyer the CFO funds. The buyer who tells their CFO "we are deploying AI" is the buyer whose budget gets cut.
What to Do Before Your Next Vendor Call
Three concrete steps.
Write the proficiency map for your team. Where is each role today on a four-level scale. Where do you want them in twelve months. The map is the document that turns "we need AI" into "we need this specific transition for this specific role."
Read the Anthropic piece. "Building Effective Agents" is free, twenty minutes, and will save you a six-figure procurement mistake. Internalize the part about when not to build an agentic system at all.
Ask the verification question first. On every vendor call, in the first ten minutes, ask: "What does the success dashboard look like, and when do I see it?" If the vendor cannot answer concretely, the rest of the call is a sales pitch, not a procurement conversation.
The cancellation rate Gartner is forecasting is not inevitable. It is the predictable outcome of a market where most buyers cannot specify what they are buying. The buyers who can specify are going to compound their advantage over the next two years. The vendors who can sell honestly to specifying buyers are going to compound theirs.
The model is fine. The vocabulary is what is missing.
Written by Headways Team