From Framework to Field: A Real AI Project from Problem to Pilot

In my last post, I walked through the practical considerations for running an AI pilot: how to scope it, what to measure, and how to define success and failure before the work begins. I ended with a promise: this week, I’d walk through a concrete scenario using an actual business problem, from problem definition through pilot design.

So here it is. This is a real project, not a sanitized case study. The business is mine, the problem is mine, and the decisions (both good and bad) are mine. I’m sharing it because the best way to show how a framework works is to use it.

The Context

I built ResortSteward, an operations-first web application for cottage resorts. It unifies bookings, payments, automated guest communications, task management, scheduling, staff coordination, and analytics for resorts with cottages and rentable resources.

Like any small software company, I need to find customers. And finding customers means prospecting: identifying resorts that might be a fit, researching them, and reaching out. This post is about that process, not about the product. I’m using my own experience because it’s real, and because the problem is one that almost any business can relate to: how do I find and connect with potential customers more efficiently?

Phase 1 — Strategic Framing: What’s the Actual Problem?

The framework I outlined in Post 14 starts here: define the problem in business terms before thinking about solutions. It sounds obvious, but in practice most people skip this step and jump straight to “how can AI help?”

The Manual Process

My prospecting process looked like this. I would run web searches for vacation regions across Ontario, focusing on typical vacation areas like the near north. Experience had shown me I needed to search by specific geographic area because a broad search like “rental cottages in Ontario” returns thousands of results, most of them useless for my purposes.

From those search results, I would try to identify resorts that fit my target market. My ideal customer is a single-property resort with five or more cottages that is either not using a booking system at all or relying on something homegrown. No directory or database exists for this niche; I must review each listing to find them one at a time.

The filtering was the hard part. Every search returned a mix of AirBnB and VRBO listings for individual properties, Expedia and Booking.com pages for large hotels, broker sites listing multiple properties, review blogs, and sponsored results that had nothing to do with what I was looking for. For each result that survived that initial scan, I had to visit the actual website and make a judgment call: does this resort match what I’m looking for? Do they already use a professional booking platform? Are they large enough?

Once I found a prospect that seemed like a fit, I would cross-reference against my “customer relationship management system”, which, honestly, is just a spreadsheet. If they hadn’t been contacted before, I would record their website, some details about the property, and their contact information. Then I would craft a personalized cold email, positioned to set up a follow-up phone call a week later.

It worked. But as you might imagine, it was mind-numbing and it was not fast.

The First Problem Statement — and Why It Wasn’t Good Enough

My initial reaction was straightforward: “This takes too long.” That’s a legitimate frustration, but it’s not a problem statement you can solve. It’s too vague. There are at least two different ways to improve the situation, and they lead to different solutions.

The first lever is efficiency: reduce the time I spend on each candidate. The second is effectiveness: improve the response rate so that fewer candidates need to be contacted to generate the same number of interested leads. And of course, the third option is some combination of both.

Putting Numbers on It

I tracked my time across the manual process and found that I was spending roughly 15 minutes per candidate by the time I sent them an email. That includes searching, filtering out the noise, visiting websites, validating the fit, checking my spreadsheet, recording details, and writing a personalized message.

My response rate after the initial email was around 5%. For cold outreach to businesses that have never heard of you, that’s not unusual. But the math is worth doing: at 15 minutes per candidate and a 5% response rate, I need to process about 20 candidates to generate one interested lead. That’s roughly five hours of work for a single expression of interest.

Five hours. Ouch.

The Refined Problem Statement

With those numbers in hand, I could define the problem properly. Cold email is a method known for low response rates, and I wasn’t trying to reinvent it. What I needed was to dramatically reduce the time investment per lead while maintaining enough volume to compensate if response rates stayed flat or even dropped.

The problem statement became: reduce the time to generate a qualified lead via cold email from five hours to thirty minutes. Response rate may decrease as long as increased volume compensates. Per-lead cost must stay below $100, with a pilot target of $10 to $25 per lead.

That’s what Phase 1 is supposed to produce: a specific, measurable target with explicit trade-offs and a cost constraint. Not “let’s use AI to fix prospecting.”

Decision Gate 1: Is this problem strategically important right now? Yes. As a solo founder, every hour I spend on manual prospecting is an hour not spent on product development or serving existing customers. But more fundamentally, lead generation is the constraint on revenue growth. Without a more efficient way to find prospects, the business can’t grow beyond what my manual effort supports. This problem passes the gate.

Phase 2 — Feasibility and Readiness: Can This Actually Be Done?

With the problem defined, the next step is to assess whether a solution is realistic given the data, the tools, and the operational context. This is where many AI initiatives should be stopped, and where stopping (or rethinking the problem definition) is often the right decision.

Mapping the Workflow

The manual process breaks down into four distinct steps: search for resorts by region, filter out the obvious non-fits, qualify the remaining candidates by visiting their websites, and personalize and send outreach emails. The bottleneck sits in the middle. The searching itself is fast; so is sending an email once you know what to write. It’s the filtering and qualifying that consume most of that 15 minutes per candidate – wading through irrelevant results and making judgment calls about each website.

Assessing Readiness

A few realities were worth acknowledging upfront.

The data I was working with (web search results) is unstructured, noisy, and inconsistent. There is no clean database of cottage resorts in Ontario. The information I needed (number of cottages, whether they use a booking system) isn’t in a spreadsheet somewhere; it’s scattered across individual websites with varying levels of detail and design quality.

My CRM is a spreadsheet. That’s a known gap, and in an ideal world I would address it. But this initiative was about prospecting efficiency, not about overhauling my entire sales operation. I noted the gap and moved on. Not every problem needs to be solved at once.

The Critical Feasibility Question

The hardest part of this process – the part that determines whether automation is even possible – is the qualification step. Can AI reliably visit a resort’s website and determine whether it’s a single-property operation with five or more cottages that isn’t already using a professional booking platform?

This is not keyword matching. It requires understanding the layout of a website, interpreting what kind of business it represents, and making a judgment call based on incomplete information. Some sites make it obvious; others don’t.

Testing the Hard Part First

Before building anything (before evaluating platforms, before designing a pipeline) I tested this one capability in isolation. I took a sample of resort websites, including some that clearly matched my criteria and some that clearly didn’t, and ran them through a general-purpose AI model with a straightforward qualification prompt.

Could it reliably distinguish a 12-cottage family resort using a contact form from a 200-room hotel listed on Booking.com? Could it tell the difference between an individual vacation rental on AirBnB and a multi-cottage resort operation?

The answer was yes, with caveats. The initial results were promising enough to justify building a pipeline around it, but the prompts needed careful refinement and repeated testing before the accuracy was acceptable.

The point here is the principle: if the hardest part of your proposed solution doesn’t work, nothing you build around it matters. Test that part first. It’s cheap, it’s fast, and it either gives you confidence to proceed or saves you from building something on a foundation that doesn’t hold.

Decision Gate 2: Are we operationally ready to attempt this? The feasibility test confirmed the core capability worked. The data situation was messy but that was inherent to the problem, not a gap I could fix first. The CRM was a known weakness but not a blocker for this initiative. No major readiness gaps stood in the way. Proceed.

Phase 3 — Solution Architecture: What Are the Options?

With feasibility confirmed, the next question is how to build it. The framework from Post 14 calls this the Solution Architecture phase, and the guiding principle from Post 11 still applies: what is the lowest-complexity solution that achieves the outcome?

Evaluating Tools

I evaluated several platforms against six criteria: security (where does my data go and who can see it), cost (both upfront and per-use), maintainability (can I update and modify the workflows without rebuilding from scratch), auditability (can I trace what the system did and why), error logging and recovery (what happens when something fails), and performance (can it handle the volume I need in a reasonable time).

The tools I considered included n8n, an open-source automation platform with strong workflow-building capabilities but limited support for the kind of AI-driven web browsing that my qualification step required. From a cost perspective, n8n would carry the same AI model and web search costs as any other platform; those are external services you pay for regardless of which tool orchestrates them. I looked at Clay, a platform purpose-built for sales prospecting and lead enrichment, but its enrichment data doesn’t cover the niche signals I needed (you can’t look up “does this resort have five or more cottages and no booking system” in a database). That information only exists on the resort’s own website. Clay’s per-seat fees also added up quickly for what amounted to a one-person operation. I also considered Claude Cowork, which is excellent for interactive research and analysis but isn’t designed for this type of unattended batch processing: while you can schedule jobs, there’s no pipeline orchestration, poor error handling, and creating audit logs is messy.

I ended up using my own platform, the Architected Intelligence Platform. It met my requirements across all six criteria, particularly around auditability and error recovery. But I want to be candid: for most businesses facing a similar problem, an established tool like n8n or Clay might well be the right starting point. My choice was driven by specific requirements and by the fact that I had the capability to build it. The important thing isn’t which tool you choose; it’s that you evaluate your options against criteria that matter for your situation.

Designing the Solution

The automation I designed has four jobs, and this is where an important principle shows up: not everything needs AI.

Job 1: Raw Search. Run automated web searches across target vacation regions in Ontario. This is pure automation with no AI involved. A script can run the same searches I was doing manually, just much faster, with repeatability, and without fatigue.

Job 2: High-Level Filter. Take the search results from Job 1 and remove the obvious non-fits: broker sites like VRBO, Expedia, and AirBnB; sites listing multiple properties; review and blog sites; hotels; and other results that clearly don’t match the target market. Again, no AI required. This is pattern matching and rules-based filtering. If the URL contains “airbnb.com” or “booking.com,” discard it. If the page title mentions “top 10 cottages,” it’s a listicle, not a resort. If the domain has listings for multiple properties at different locations, it’s a broker.

Job 3: Detailed Review. This is where AI enters the process. For each site that survives the first two filters, the system visits the actual website and passes its content to an AI model. The model evaluates whether the site represents a resort that matches my target criteria: single property, five or more cottages, and no professional booking system visible.

This is also where costs start to accrue. Every site the AI reviews costs something – the model charges per token processed, and a typical resort website contains a meaningful amount of text and navigation to analyze. That cost is manageable for individual sites, but it adds up at volume, which is why the first two jobs exist: they reduce the number of sites that ever reach this step.

Job 4: Draft Emails. For prospects that pass the AI qualification, again use AI to generate a personalized outreach email based on what the system learned from their website. The personalization isn’t generic; it draws on specific details like the name of the resort, the region it’s in, what their current booking process appears to be, and what operational challenges they might be facing.

The lesson here maps directly to Post 11’s decision framework: use the lowest-complexity solution for each part of the problem. Jobs 1 and 2 don’t need AI, so they don’t use it. AI is applied only where genuine judgment is required – to evaluate the website and personalize the outreach email. This keeps costs down and reduces the number of points where the system can fail in ways that are hard to debug.

Choosing a Model

For Jobs 3 and 4, I needed to select an AI model. This required experimentation, as I discussed in Post 15: you need to budget time to test different options against your specific use case.

What I found was that I didn’t need the most capable (and most expensive) model available. The qualification task (reading a website and determining whether it fits a set of criteria) is well-defined enough that a mid-tier model handled it effectively. Using a more powerful model would have increased costs without a meaningful improvement in accuracy for this particular task.

The details of model selection and cost-per-analysis will come in next week’s post alongside the pilot results.

Governance Considerations

Even for a solo operation, governance matters. The primary concern here was data handling: what information was I sending to an AI model, and was any of it sensitive?

In this case, the answer was straightforward. The system only processes publicly available website content, information that the resort has already published for anyone to see. There’s no private data, no customer information, and no credentials being shared. The AI model receives the same content that any visitor to the website would see. That said, I documented this explicitly rather than assuming it was obvious. If the use case had involved processing private communications or customer data, the governance requirements would have been meaningfully different, and that might have changed the tool selection or the model hosting decision entirely.

Decision Gate 3: Does the expected benefit exceed the cost and risk? The investment required was modest: just a day or two of work to build, test, and configure the four-job pipeline, plus the per-use AI and search costs during the pilot. Even if the pilot failed entirely, the downside was a week of my time and a small amount spent on API calls. On the other side, if it worked even moderately well, reducing my prospecting time from five hours to under an hour per lead, the return would be immediate and ongoing. The math was clear enough to proceed.

Phase 4 — Scoping the Pilot

With the solution designed, the final step before execution is scoping a pilot that will generate real evidence. As I covered in Post 15, a pilot needs to be large enough to encounter real-world variability but small enough to run quickly and cheaply.

Scope and Timeline

I chose 10 geographic regions across Ontario as my pilot scope. This was broad enough to test the system against different types of search results, different regional resort characteristics, and different levels of web presence quality. It was narrow enough to complete in a reasonable timeframe and review the results manually.

The timeline was three weeks: one week to get the system built, tested and running, one week to run the pilot and evaluate results, and a possible third week if the results indicated that tweaks were needed and a second run was warranted.

Ownership and Monitoring

In a larger organization, you would assign an executive sponsor and a small team. In my case, the owner was me, which simplified decision-making but also meant there was no one else to catch my blind spots.

The monitoring plan was straightforward: manual review of every stage of the pipeline. I would review the raw search results coming out of Job 1, the filtered results from Job 2, the AI qualification decisions from Job 3, and the draft emails from Job 4. At this scale, I could afford to inspect every output rather than sampling. The goal was not just to measure accuracy but to understand where and why the system was getting things wrong, so I could tune it.

Success Criteria

Following the approach I outlined in Post 15, I defined success criteria before the pilot began:

Time per qualified lead: target of 30 minutes or less, down from the current five hours.
Cost per qualified lead: target of $10 to $25.
AI qualification accuracy: assessed by manually reviewing every prospect the system identified as matching my criteria. How many actually fit?
Email quality: are the personalized emails specific enough and professional enough to maintain or improve my connection rate?

Kill Criteria

If the cost per qualified lead exceeds $100 (the upper bound of what I felt was reasonable) the initiative stops.

Rethink Criteria

Not every problem triggers a full stop. I defined graduated thresholds for the most critical metric, AI qualification accuracy:

If manual review shows more than 10% of the AI-identified prospects don’t actually match my criteria, the prompts and filters need tuning. If the mismatch can be corrected, proceed. If not, the filtering is still valuable, but the email personalization step needs rethinking; you can’t personalize outreach to a prospect that isn’t actually a prospect.

If mismatches exceed 25%, there’s a more fundamental issue with the AI’s ability to qualify prospects from website content. Attempt to tune, but recognize that the system may only be useful as a filtering tool rather than an end-to-end pipeline.

Above 50%, the AI qualification approach itself is not viable for this use case. Stop and redirect the effort.

These thresholds are specific to my situation, but the principle is universal: define in advance what “needs adjustment,” what “needs rethinking,” and what “needs stopping.” Without that discipline, struggling pilots have a way of continuing indefinitely.

Decision Gate 4: Is this delivering measurable business impact? This gate doesn’t get answered at the design stage – it gets answered by the pilot results. That’s next week.

What Comes Next

The pilot is designed. The four jobs are built. The success criteria and kill criteria are defined.

Next week, I’ll share what actually happened when I ran it: the results, the filtering challenges, the prompt tuning that was required, the cost analysis, and the honest verdict measured against the criteria I just laid out.

Wrap Up

Whether you’re automating prospecting, processing invoices, or triaging support tickets, the path from “I have a problem” to “I have a testable pilot” follows the same structure. Define the problem in measurable terms. Test the hardest part first. Choose the simplest tool that meets your requirements. And scope a pilot with criteria that will tell you (honestly!) whether to proceed, adjust, or stop.

This post is part of a series on the current state of AI, focused on how it can be applied in practical ways to deliver measurable improvements in productivity, cost savings, and response times. If you’d like to explore more, all previous posts are available under Insights; please read them and reach out with any questions or comments you have.

From Framework to Field: A Real AI Project from Problem to Pilot

Want to explore how AI can work for your organization?