AI and domain expert collaboration is a phrase that means many things depending on who is using it. At one end, it describes what happens when a company building a general-purpose AI tool invites a handful of practitioners to sit in on product reviews and nod at a demo. At the other end, it describes a structured, sustained process in which recognised domain practitioners are the primary architects of what an AI system knows and how it reasons. The distance between these two things is enormous. And it explains most of the gap in outcomes.
This article is about the second kind of collaboration. It explains why it is different from consulting, what it actually involves, what domain experts contribute that data cannot, and what well-structured AI and domain expert collaboration produces.
Why AI and domain expert collaboration is different from consulting
When organisations bring in domain experts as consultants during AI development, the typical mode is advisory. The expert reviews requirements, attends workshops, provides feedback on outputs, and approves the final product. This is valuable. It is not collaboration in the meaningful sense.
The difference is about who is doing the primary cognitive work of defining the knowledge structure of the AI. In consulting mode, that work is done by engineers and product managers who then present their model to the expert for validation. The expert corrects what is wrong. But a great deal of what is wrong never surfaces at all — because the engineers' model of the domain is built from documents and interviews, and domain expertise is substantially tacit, which means it does not appear in documents and is very hard to surface in interviews.
In genuine AI and domain expert collaboration, the expert is not validating someone else's model. They are building their own — with the AI development team providing the technical capability to encode and operationalise what the expert knows. The distinction is not semantic. It changes the kind of knowledge the system ends up carrying.
"We can know more than we can tell." — Michael Polanyi. Domain expert AI is, in large part, a technology for closing that gap.
This is harder to organise, harder to schedule, and harder to price. Domain practitioners are expensive and busy. Sustained engagement over months is a significant commitment. But it is the commitment that makes the difference between an AI product that feels domain-specific and one that actually is.
The three collaboration modes
In practice, AI and domain expert collaboration happens across three distinct modes, each serving a different function in the knowledge encoding process. These modes are not sequential stages — they overlap, recurse, and repeat throughout the development cycle.
Mode 1: Knowledge extraction
Knowledge extraction is the process of surfacing what practitioners know but cannot easily articulate. It draws on techniques from cognitive task analysis, a methodology originally developed in human factors and safety-critical systems research. The core question is not "what do you know about X?" but "how do you think when you encounter X?" — and specifically, "what do you notice, what do you rule out, and what are you uncertain about?"
Structured knowledge extraction sessions are typically built around representative cases. The expert works through a case — ideally a real one, from their own experience — talking through their reasoning process in real time. The knowledge engineer's role is to push for specificity: not "I would consider the medication history" but "what specifically in the medication history, and what would it change about your assessment?" Over many such sessions, a picture emerges of the expert's decision architecture: the variables they attend to, the weights they apply, the heuristics they use, and the edge cases they treat differently.
This process is time-intensive and requires knowledge engineers who are skilled at elicitation — who know how to ask the kinds of questions that surface tacit knowledge rather than just prompting the expert to restate what is already explicit. It is one of the rarest and most valuable capabilities in expert-driven AI development.
Mode 2: Protocol encoding
Protocol encoding is the translation of extracted knowledge into a form the AI system can use. This is where the knowledge engineer and the AI engineer work together, converting the patterns, heuristics, and judgment structures surfaced in extraction into something the system can apply to new cases.
This is iterative and error-prone. Early encodings are almost always incomplete. They handle the cases the expert explicitly walked through, but fail on variants the extraction sessions did not cover. Identifying these gaps requires testing the system against new cases and bringing the expert back in to assess where the outputs diverge from their judgment.
Protocol encoding is not fine-tuning a language model on domain text. It is closer to the construction of a structured knowledge base — a representation of how an expert reasons through a problem type, not just what conclusions they tend to reach. The two are related but not equivalent, and conflating them is one of the most common mistakes in domain AI development.
Mode 3: Validation cycles
Validation in domain expert AI is not a final step. It is an ongoing loop that runs throughout development and continues after deployment. The expert reviews outputs, assesses their quality against domain standards, identifies where the system's judgment diverges from their own, and works with the development team to understand why.
Well-designed validation cycles involve blind testing: the expert assesses outputs without knowing whether they came from the AI or from a human practitioner. This removes the bias towards charitable interpretation that tends to inflate expert ratings when they know they are assessing AI output. Blind validation is harder to organise but produces far more reliable signal about where the system is and is not performing to standard.
The output of each validation cycle feeds back into the extraction and encoding processes. A validation failure — a case where the system's output was wrong or insufficient by domain standards — is valuable information. It tells you what knowledge is missing, what has been encoded incorrectly, or what edge cases the system has not yet encountered. Treating validation failures as diagnostic rather than embarrassing is essential to the quality of the eventual product.
What the expert contributes that data cannot
There is a reasonable question about whether sustained expert collaboration is necessary, or whether sufficient domain data — clinical records, legal documents, engineering reports, financial analyses — could produce a domain-capable AI without it. The answer is that data and expert collaboration are substitutes along some dimensions and complements along others, but that for the kinds of performance that matter in professional applications, data alone is insufficient.
Data encodes what has happened. It captures the decisions practitioners have made, the conclusions they have reached, the documents they have produced. What it does not capture is the reasoning process that produced those decisions: what was considered and discarded, where uncertainty was present but not documented, how the practitioner handled the ambiguity that is present in nearly every professional judgment. Expert collaboration surfaces this reasoning process. Data cannot.
Data also has a quality distribution that reflects the full range of practitioner competence — from excellent to mediocre. Domain expert AI built from data tends to reflect the average of that distribution. Collaboration with carefully selected expert practitioners produces something that reflects the upper tail: the judgment of the people who are best at the domain, not the mean of everyone who has ever practised in it. For high-stakes professional applications, this distinction is significant.
Finally, data cannot tell the AI what it does not know. Expert collaboration can. Part of what experienced practitioners contribute is a map of the domain's uncertainty structure: where the evidence is thin, where individual judgment varies legitimately, where the cases are genuinely hard and no confident answer is warranted. An AI system that has this map is a meaningfully different — and more trustworthy — system than one that confidently produces outputs in exactly the situations where confident outputs are most dangerous.
Structuring the collaboration: what works, what does not
The practical organisation of AI and domain expert collaboration is a discipline in itself. Some structures work consistently; others fail predictably.
What works: sustained engagement over an extended period. Short-burst workshops produce explicit knowledge. Sustained collaboration — weekly or fortnightly sessions over months — gives the knowledge engineer time to probe below the surface, return to cases that were unclear, and catch the tacit knowledge that only surfaces gradually. What also works: building the collaboration around real cases from the expert's own practice, rather than hypothetical scenarios. The expert's reasoning is most accessible and most accurate when anchored to their actual experience.
What does not work: bringing in multiple experts from the same domain in a group setting and expecting productive knowledge extraction. Group dynamics suppress minority views and lead experts to anchor on consensus positions rather than expressing the full texture of their individual judgment. Individual sessions are almost always more productive. What also does not work: knowledge extraction conducted by technical staff who lack domain literacy. A knowledge engineer who does not understand the domain cannot identify when the expert is skating over a complexity, cannot ask the follow-up question that unpacks a heuristic, and cannot catch the moments when two statements are in tension. Domain literacy is a prerequisite for good extraction, even if deep domain expertise is not.
The output: AI that carries domain judgment
When AI and domain expert collaboration is done well, the resulting system carries something that generic AI does not: domain judgment. Not just domain vocabulary, not just domain data patterns, but the evaluative structure that determines how a practitioner weighs evidence, handles ambiguity, and knows what matters.
This is visible in outputs. A system that carries domain judgment produces conclusions that experienced practitioners recognise as coming from something that understands the domain — even when the specific case is novel. A system that lacks domain judgment produces outputs that are often right but sometimes wrong in the specific ways that practitioners find most alarming: confident, plausible-sounding, wrong about exactly the things that matter.
The difference is most visible in edge cases, in ambiguous presentations, and in the moments where the right answer is "I don't know — here is what additional information would resolve the uncertainty." These are the moments that define whether an AI product can be trusted in professional practice, and they are almost exclusively the product of AI and domain expert collaboration done rigorously and at depth.
For more on what this produces in terms of product design and outputs, see our complete guide to domain expert AI products. To explore how our expert network contributes to this process, visit our experts page.