When AI Reads Your Documents: What Building Docora Taught Me About RAG

For thirty-one years I worked at Taylor Steel, running both the IT and HR departments by the time I retired. In both roles, the same kind of conversation came up several times a day: A supervisor would forward an employee question, a manager would ask how to do something on a system. In both cases, a member of my team would spend time assembling an answer from policies, technical documentation, legislation, and historical records. While the type of questions were not the same, the pattern certainly was.

Every one of those answers had to be correct in light of the relevant policy or technical reality, consistent with how the same kind of question had been handled before, and grounded in the company’s own documents rather than something generic. Often a single answer required pulling material from several sources at once. An employee discipline question, for instance, could touch employment legislation, our policies and standards, the historical discipline record for a comparable situation, and the specific employee’s file.

This is the problem that Retrieval Augmented Generation, usually written as RAG, was built to solve. The acronym is unfortunate, and I would not blame a leader who heard it described in vendor language and tuned out. The simpler way to think about it is this. RAG is an AI chatbot that answers questions using only the documents you give it, with no external sources. When the answer is not in those documents, a properly built RAG system does not invent one. It tells the user it does not have the answer.

This is the first of four deep dives following last week’s overview, and it covers the knowledge access category. The worked example throughout is Docora, the document-grounded assistant we built.

The Two Problems Underneath One Technical Solution

RAG is a single technical pattern, but it serves two very different deployments. Treating them as the same product is one of the more common ways a deployment underperforms.

The internal version is the one I just described. Knowledge lives in policies, procedures, legislation, clarifying memos, and historical records, and senior staff spend a meaningful share of their week acting as an internal help desk because they are the only ones who know where the answers live. An example of this is the case where an employee asked her supervisor whether she could take a three-week leave of absence to be with her sister, who lived in another country and had just had a baby. The question came up to HR, and building the proper answer meant pulling in how the Family Responsibility Leave provisions of the Employment Standards Act fit her specific situation, what the internal policies allowed for leaves of that length, and how comparable requests had been handled before. Assembling the answer consumed several people’s time, and the employee waited days for a response to a question she needed to plan around.

The external version shows up wherever buyers research products and services digitally. Most prospects do their first round of evaluation entirely online, before they pick up the phone or send an email. That creates a catch-22 for the seller. The more comprehensive your website becomes, the harder it is for a specific buyer to find the specific information they need. These days, many buyers will not reach out to ask, they’ll simply move on. Imagine you are looking for a long-term care facility for your aging parent, and you want a home that offers a range of care levels so you do not have to relocate them again later. You’ll almost certainly begin by doing research online, and the easier it is to find evidence that a particular home offers what you need, the more likely it is to make your short list. Most homes will be eliminated from consideration without ever knowing they were in the running.

The two patterns are solved by the same broad technology, but a RAG system needs to be tuned specifically for each use case. An internal system should sound like an expert, and when it does not have the answer, it should say so clearly, name who to follow up with, and tell the user not to take action without clarification. Its answers should be targeted and cite the specific legislation, policy, or document they came from. On the other hand, an external system should sound friendly and helpful, and be aware that the visitor is in the middle of a decision. Its answers should be broader, willing to anticipate and pull in related products or services when doing so will genuinely help the visitor.

Where RAG Genuinely Fits

RAG isn’t always the right answer! There are a few structural conditions to help you understand when a RAG deployment is worth building.

The first is a real corpus worth retrieving, where you already have documents that were written to be authoritative on topics where people need recurring access to the knowledge they contain. It is worth noting that the source documents do not always live where you might first think. A useful RAG deployment can pull from your IT help desk system, your HR information system, your CRM, your website content, or all of them at once.

The second is a retrieval need that recurs, that is, when you have the same kind of question, asked frequently, by many people. If your organization has one or two people who everyone knows are the right person to ask about a particular topic, and those people are spending time every week (or day) answering those questions, that is your signal.

The third is an audit trail requirement, and it is the condition where I have seen the value get most concrete. Before one of our current clients had Docora, an employee applied a policy incorrectly to an unusual situation — while they followed the policy correctly, they did so without considering the relevant legislation. A complaint was filed, and the result was an enforcement action. When we later deployed Docora and ran the validation set, we used a version of that same situation as one of the test questions. Docora answered correctly, citing both the policy and the legislation that bounded it, in seconds. That is the difference between giving the right answer and being able to demonstrate, after the fact, that the right answer was given.

Where It Doesn’t Fit

When I meet with a customer who is considering Docora, I always review the places where RAG doesn’t fit. Doing so upfront saves everyone involved time and money — if the solution isn’t going to bring real value, we need to be honest about that. This is the same instinct behind the “AI hammer” warning: not every problem is a nail.

Small corpora are rarely worth the build. Twenty documents will not usually justify a retrieval layer, and a search box plus a person who knows the documents will likely be sufficient.

Content that changes faster than the index can keep up is the second no-fit. A RAG system reflects the state of the documents at the last index refresh, and if your source of truth is changing several times a day, the rebuild cadence becomes the bottleneck on accuracy. A different architecture is usually the right answer.

Knowledge that lives in someone’s head is the third because RAG retrieves what is written down. If the real expertise was never documented, the project starts with a writing exercise, not an AI exercise. (Spreadsheets and content that requires numerical or financial analysis call for a different RAG architecture than the document-based pattern this post covers, and that is a worthwhile topic for another day.)

Risks Worth Knowing About

Even when RAG fits well, three risks are worth designing against from day one. They sit alongside the broader set of chatbot risks leaders need to understand, but these three are specific to the retrieval pattern.

Hallucination dressed up as citation is the highest risk case. It is possible for a RAG system to produce a confident answer with a citation to a real document. The problem comes up when the citation exists, but the claim is not actually in the document the citation points to. This is solvable, but only with a deliberate validation layer that checks every citation against the retrieved content rather than trusting the model to be honest about its own sources. Most off-the-shelf RAG deployments do not include this layer by default, and it is the kind of thing worth asking vendors about explicitly.

Permissions leakage at retrieval is the second. This occurs when a user asks a question and gets back content from a document they should not have been able to read. The fix for this is structural, not something that can be bolted on afterwards. Docora is built around what we call collections, which are groups of documents that users or roles must be specifically granted access to. Without that grant, a collection is never included in the body of knowledge the system draws on to compose an answer.

Stale indexes that sound current are the third, and the consequence is a user making a decision on a confidently delivered, out-of-date answer. The fix has two parts: Matching the rebuild cadence to the actual change rate of the content, and maintaining a clearly owned process for keeping the documents in the RAG system aligned with the source of truth elsewhere in your business. That is exactly the kind of work that lives with the maintainer and steward roles I wrote about a few weeks ago.

What I Would Do Differently

Three lessons stand out to me from running Docora deployments over the past year.

The first is that not every document should carry the same weight when the system composes an answer. A reasonable default puts authoritative material first, things like legislation, formal policies, and signed contracts; historical records and prior decisions second; and softer material like internal blogs or training notes third. All of it remains retrievable, but the more authoritative sources should drive the answer. We settle the weighting model with the client before the system goes live, because the right weighting reflects how the organization actually makes decisions, and that is not a setting you want to discover by accident after launch.

The second is that the retrieval layer (where answers are generated) has to be tuned specifically for the audience. The generation layer must be tuned differently for the internal and external cases. Trying to serve both audiences from a single configuration is exactly how a deployment ends up performing acceptably for neither.

The third is that corpus curation is the project. The largest gains in Docora performance have come from cleaning up source documents, not from swapping models or vector databases. With one customer, the implementation surfaced policies that gave contradictory guidance on the same topic. Docora cited both, accurately, because both were in the corpus. It cannot fix inconsistent information for you. A proactive sweep of the documents looking for contradictions and gaps is worth budgeting for, ideally before launch. This does take time and add cost, but it is absolutely worth it.

A connected discipline that often gets overlooked is what to do with the questions themselves. Docora saves every query that gets asked, and the analysis of those queries has become one of the most valuable outputs of a deployment. People will ask a chatbot questions they would not ask another human, especially uncomfortable questions, and the pattern of what gets asked tells you what is actually alive in your employee base, or where your customers are getting stuck.

Closing

When you put an AI system in front of your knowledge base, you are handing it part of your corporate identity. The way it answers, the documents it draws on, and the tone it uses will all be read by the people on the other end as a representation of your organization. That deserves more thought than a drop-in deployment, and any vendor who tells you otherwise is not giving you good advice.

Next week’s deep dive covers the demand generation category, with the Architected Intelligence prospecting platform as the worked example.