Real-World AI Implementation: What Generic Models Get Wrong

The failure rate for enterprise AI implementations is consistently reported above sixty per cent. Organisations spend months in procurement, years in deployment, and significant capital on tools that either get abandoned, quietly shelved, or worse — used in production while producing unreliable outputs no one fully trusts.

The standard explanations are technical: poor data quality, insufficient training, integration complexity, lack of internal AI talent. These are real problems. But they are not the root problem. The root problem is that most AI implementations encode the wrong knowledge — and in professional domains, that failure is structural, not incidental.

This article explains what real-world AI implementation actually requires in high-stakes domains, why generic models consistently underperform in practice, and what the gap between theory and deployment looks like from the inside.

The implementation failure pattern

AI implementation failures in professional domains tend to follow a recognisable pattern. A tool is acquired — often a fine-tuned general model or a retrieval-augmented system built on public data. It performs well in demonstrations, which are carefully constructed around cases the model handles confidently. It is deployed. And then, over weeks or months, practitioners stop trusting it.

Not because it is always wrong. In fact, the frustrating thing about these tools is that they are often right — in the easy cases. They handle the 80 per cent of inputs that are well-represented in their training data with reasonable accuracy. But in the 20 per cent that are not — the complex cases, the ambiguous presentations, the edge conditions that experienced practitioners navigate through tacit judgment — the model either fails silently, produces a confident wrong answer, or hedges in ways that provide no actionable guidance.

In medicine, law, engineering, and finance, the 20 per cent is where outcomes are determined. A clinical decision tool that handles routine presentations well but fails on atypical ones is not a partial success — it is a liability, because practitioners learn to distrust it selectively but cannot always predict which cases warrant that distrust.

"The tool was fine for the easy ones. The problem is that I didn't need help with the easy ones."

What generic models actually know

A generic model is trained on what has been written — text that exists in the public domain, in published literature, in structured databases, in scraped web content. In any mature professional domain, this text represents an enormous volume of information. Medical literature alone contains tens of millions of published papers. Legal databases contain centuries of case law. Engineering standards fill libraries.

The problem is not the volume. The problem is what is missing from text. Practitioners do not write down most of what they know. The judgment they exercise in their daily work — the rapid assessment of a situation as it presents, the weighting of competing considerations, the recognition of patterns across thousands of prior cases, the knowledge of when the standard protocol does not apply — none of this is in the literature. It exists in people. It is transmitted through apprenticeship, supervised practice, and accumulated experience. It is what separates a practitioner with twenty years of mastery from a competent recent graduate following the same protocols.

A generic model trained on published literature will approximate the explicit knowledge of a domain — the rules, the standards, the documented processes. It will not carry the tacit knowledge of its practitioners. And in real-world implementation, tacit knowledge is where the leverage lives.

The four requirements for real-world AI implementation

When AI actually works in professional deployment — when practitioners use it consistently, trust its outputs, and integrate it into their workflows — it is because four conditions are met. Generic implementations typically meet one or two. Expert-built implementations are designed to meet all four.

1. The knowledge source must be primary

The AI must be built on knowledge drawn directly from the people who practise in the domain — not from secondary sources about that practice. The difference between building an AI on a cardiologist's documented decision process and building it on cardiology textbooks is the difference between encoding practice and encoding theory about practice.

This requires a different kind of knowledge acquisition than most AI projects undertake. It is not about data collection. It is about structured elicitation of tacit knowledge from practitioners — sessions designed to surface the reasoning that never gets written down, the edge-case handling that is transmitted only through supervised experience, the weighted judgments that experienced practitioners make automatically but junior ones miss.

2. Outputs must be auditable

In any high-stakes domain, practitioners need to understand why the AI produced a particular output. Not just what it concluded, but the reasoning path it followed. This is not primarily a regulatory requirement — it is a practical one. Practitioners will not use a tool they cannot interrogate. When the AI flags a risk or recommends a course of action, the practitioner needs to be able to trace that output back to a reasoning structure they recognise as valid.

Generic models produce outputs that are statistically derived from their training distribution. They cannot explain their reasoning in terms a practitioner can evaluate. Expert-built AI, built on documented practitioner reasoning chains, can trace every output back to explicit logic — logic that the collaborating expert helped construct and validated.

3. The model must know what it does not know

One of the most dangerous failure modes in AI implementation is confident uncertainty — when a model produces a definitive-seeming output in a case it is not equipped to handle. In professional domains, the appropriate response to genuine uncertainty is not a confident answer. It is a flag: this case is outside the model's validated scope, and a human expert should take over.

Encoding this kind of epistemic humility requires knowing, in advance, where the model's knowledge ends. That is only possible if the knowledge itself is explicit — drawn from a documented expert source with known scope rather than implicit in the weights of a generalised model.

4. The implementation must fit the workflow

AI tools that require practitioners to change their workflow to accommodate the tool will not be used. This sounds obvious, but it is violated by the majority of implementations. The tool is built according to what is technically convenient — what makes training or inference easier — and practitioners are asked to adapt to it.

Real-world implementation requires understanding the practitioner's workflow from the inside, which means working with practitioners from the design stage. Not as reviewers of a completed product, but as co-authors of the knowledge structure the product encodes. This is not the same as user research. It is a different kind of engagement — one where the practitioner's expertise shapes the product, not just validates it.

The difference implementation makes

When these four conditions are met, the implementation dynamic changes completely. Practitioners begin to use the tool not because they were told to, but because it adds genuine value they cannot get elsewhere — access to the structured judgment of a more experienced practitioner, encoded in a form that is available every time they need it.

The tool is trusted because its outputs are traceable. It is used consistently because it fits the workflow. It is reliable in hard cases because it was built on knowledge specifically sourced for hard cases. And it knows its limits — which, paradoxically, makes practitioners trust it more, not less.

This is what distinguishes successful AI implementation in professional domains from the majority of projects. Not better data infrastructure. Not more compute. Not a more sophisticated model architecture. A fundamentally different approach to where the knowledge in the system comes from.

Implications for AI procurement and deployment

For organisations evaluating AI tools for professional domains, the practical implication is straightforward: the right question to ask is not "what model is this built on?" but "whose knowledge is in this product?"

If the answer is "it was trained on [large dataset] and fine-tuned on [domain corpus]," that is a description of a generic approach to domain implementation — and it comes with all the failure modes described above. If the answer is "it was built in structured collaboration with [named practitioner] whose decision process it encodes," that is a fundamentally different claim — one that can be evaluated, audited, and validated against the reasoning of the expert whose knowledge it carries.

The distinction matters because the failure mode of generic implementation in high-stakes domains is not just inefficiency. It is the deployment of overconfident approximations in contexts where the cost of wrong answers is borne by real people.

Domain expert AI products exist to close this gap — not by building better generic models, but by building narrower, more precise ones grounded in the knowledge of people who have spent careers mastering their domain. That specificity is the product. See also: Expert-Built AI vs Generic AI and Praxa case studies.