There is a kind of work that consumes a lot of people’s time in most organizations. It takes hours, it requires judgement, the inputs are “fuzzy”, and the rules are often unwritten or highly variable. There is enough underlying structure that the work can be described as a sequence of steps (even if no one has written that sequence down) but the steps themselves require a person to make calls along the way. This work, which lies between pure rule-based work that does not need AI and pure judgement work that cannot be automated, is where I believe most of the interesting AI opportunities live today.

Some AI-enabled examples of this pattern of work include:

  • IT: A help desk ticket arrives, gets categorized against a support catalogue, gets a first response drafted from the knowledge base, and gets routed to a human only if one is still needed.
  • Sales: An inbound contact form lands, the company gets enriched against public data, the lead gets scored, and the result gets assigned or automatically acknowledged.
  • Accounting: An invoice arrives by email, fields get extracted, the document gets matched against a purchase order, and exceptions get flagged for review.
  • Human Resources: A resume gets parsed, the best candidates get matched against the role, and first-touch communications are drafted.

Each of these examples have the same structural shape: several small (but judgement-heavy) steps, where each step previously required a person but is eligible for AI-assisted automation. But when adding in that AI assistance, one of the key decisions is (or should be!) the choice of AI model at that stage.

I have been building a prospecting pipeline on top of this pattern for a year, and that’s the example I’ll be using in the rest of this post.

The Interesting Decision Is the Model, Not the Platform

The orchestration layer is not the interesting decision. Automation platforms have largely solved the plumbing problem. n8n is the example most readers will recognize, and Zapier, Power Automate, Make, and Relay are in the same general category.

I chose to build the prospecting pipeline on the Architected Intelligence Platform (my own software) rather than one of those off-the-shelf options. The honest reason is that audit logging and permissions are first-class in AIP, which matters when the pipeline is calling several external services with credentials, touching prospect data, and producing outbound communications that have legislative (CASL) implications. The same controlled-access principle that runs through Docora (last week’s post) runs through AIP. Any of the off-the-shelf platforms would work for many readers; the choice of orchestration platform is not what this post is about.

The interesting decision is which model is used at each step. Get that wrong, and the pipeline either costs ten times what it needs to, produces output that is not usable, or both. Picking the same model (especially defaulting to the best available model) for every step is the most common mistake I see, and it is almost always a bad choice.

Five Steps, Five Different Choices

My prospecting pipeline has five steps. Each one has a different cognitive demand and therefore has different requirements for the speed and capability of the model being used.

The first step is the raw search. A search API returns candidate accounts against the query. AI is not required for this step; there is no need to pay an AI to do what a search engine already does well.

The second step is a snippet filter. The search returns hundreds of results, each with a brief snippet describing the website. The task is a fast binary classification: does this snippet plausibly match the customer profile I’m after? The cost matters because the volume is high, but the capability matters less, because the downstream step will catch most of the noise. A small (free) local model running on local, commodity hardware is the right tool for this kind of throwaway pass.

The third step is website qualification. Surviving candidates get their websites read in depth. The model has to traverse multiple pages, follow links to discover information about the prospect, and reason about whether what it finds maps to the exact customer profile I’m seeking. This is genuinely beyond a small local model, but a mid-low tier public LLM gets the job done at a very reasonable cost per site.

The fourth step is contact enrichment. Multiple email addresses, several social profiles, and one or more phone numbers typically come back from enrichment services for a given target. Picking the right primary contact involves pattern matching with edge cases. The system has to make determinations like “this LinkedIn URL is the right John Smith; that one is not”, or “this address is the work account; that one is personal.” Mid-low tier public models handle this kind of work well, and spending top-tier money here does not buy better answers.

The fifth and last step is email personalization. The pipeline writes a short personalized opening for each qualified prospect, drawing on what the earlier steps produced. This is where capability matters most, because the output is read by a human, and a weaker model gives itself away through awkward phrasing, generic praise, or subtle factual inaccuracies. This is the one step where I pay up for a mid-high tier model to get results I’m satisified with.

The Trade-offs Worth Understanding

Smaller models are cheaper and faster, but they produce more false positives and more false negatives. The trade-off is real.

What matters is that the tolerance for each error type differs sharply by step. A false negative at step two (a good prospect filtered out) is usually acceptable because the candidate volume is large. A false positive at step two (a bad prospect survives) is also acceptable, because step three catches most of them. The same errors at step five (email personalization) are entirely different. However, a false positive at step five goes out the door with my name attached, and the cost is measured in deliverability damage, reply quality, and goodwill, not in compute costs.

The asymmetry that matters most is this: paying more at the right step buys you results; paying more at the wrong step just spends money. Most of the over-spend I see in production AI pipelines comes from defaulting to a top-tier model at every step because it is simpler to configure.

If you can’t live with the false-positive and false-negative rates at a given step, you have two honest choices. You can either upgrade the model at that step, or you can add a downstream verification step that catches the errors before they compound. The second option is often the cheaper one and typically results in equally effective results.

Web Search Is Its Own Trade-off

Even before the model selection question, the search layer constrains what the rest of the pipeline can see. Ranking quality matters, and prospects who fit your customer profile but have weak SEO will not surface in the top results. Incomplete information shows up at the snippet level, and it isn’t unusual for the qualifying signal to be missing from the snippet itself (which is one reason the snippet filter has to be fast and cheap). Anti-bot blocking is increasingly common with a growing number of websites refusing to render for headless browsers, which means the website qualification step returns nothing usable for those accounts. None of these are AI problems, but they decide how much of your candidate pool ever makes it into the pipeline. When using web search APIs, you must test to determine if the result set is good enough for your purposes — if not, there is no point in proceeding.

Closing

The transferable point is the same across help desk triage, invoice intake, resume screening, and prospecting. Anywhere you have judgement-heavy work with enough structure to pipeline it, the model-selection logic applies. The pipeline shape changes from one use case to another but the principles do not.

Pick the cheapest model that can do each job well enough at that step and only pay more where false positives are expensive. Do not pay more where the next step will catch the noise anyway.