My AI Stack, Four Months Later

Four months ago, I wrote about my personal AI stack, covering the tools I used every day along with what I’d evaluated and what I was watching. Some of those choices have held up, and some haven’t.

The more interesting story is that the stack itself has changed shape. In January, I described a set of chat-style tools I used as productivity aids. Today, most of what I use sits closer to infrastructure (model APIs, automation platforms, agentic workflows) with chat interfaces sitting on top rather than at the centre.

There’s an honest theme that runs throughout this update. I’ve stopped trying to use the best tool in every category and really started valuing consistency and integration across a smaller family of apps. That sometimes means I’m a little behind the state of the art on a specific axis, but I believe the tradeoff is worth it. A predictable, integrated stack does more for my actual productivity than any individual best-in-class tool ever did.

I don’t think this change in my behaviour is unique; I believe it mirrors a broader pattern worth naming for business leaders. The interesting question stops being “which AI tool should we adopt” and starts being “what AI capabilities do we need to build into our business, and what is the right technology stack to support that?”

LLMs

In January, the LLM section was all about chatbots. Today it’s about models, with chat being just one of several ways I access them.

I run some models locally (using Ollama on a 32 GB MacBook) for tasks where the data shouldn’t leave the laptop. The constraint rules out the largest models, but there’s plenty of useful ground: Llama 3.1 8B, Qwen 2.5 14B, Mistral Nemo 12B, and Gemma 2 9B all fit comfortably on reasonably capable consumer hardware.

For paid access, I’m currently using two API providers, Groq and Haimaker. I use two providers because their pricing differs by model and so does their model catalog. My apps are designed to be plug-and-play across providers, which lets me chase pricing aggressively rather than locking into any one vendor. Models I use regularly include Qwen 3, Llama 3.3, GPT-OSS, and Whisper (for audio transcription). The biggest cost lever in any production AI app turned out to be model selection per task, not provider choice or prompt design. Picking “the best in class” model is usually an expensive mistake.

Coding

The three serious AI coding tools (Codex, Cursor, Claude Code) keep trading the lead in features and capabilities. None of them is permanently ahead, and any of the three is good enough to standardize on. This is the first place the consistency theme started paying off. I standardized on Claude Code and built up a library of skills and tooling around it. Switching costs are real once you’ve invested in conventions, prompts, and integrations; chasing the leader every quarter would have cost more in lost tooling than the productivity gap was ever worth.

I will provide some honest update on cost and reliability: I’m on the highest-priced Claude plan and even there I run into capacity issues such as service degradation when servers hit load and occasional downtime. The supply-side constraints are real, and even though the providers are all adding capacity, it’s a real constraint that may get worse in the short-term.

A note for non-developers: AI coding assistants are useful well beyond writing code. Anything you’d do at a command line such as server administration, scripting, or file operations across thousands of items, is much faster with a coding assistant in the loop.

Images and Design

Nano Banana Pro (now v2) is still my main image generator; the quality-per-prompt has been good enough that I haven’t needed to look elsewhere for general work.

A new addition to my stack is Claude Design, a research preview from Anthropic Labs that launched in mid-April. It turns conversational prompts into structured visual work; we’re talking slide decks, one-pagers, wireframes, interactive prototypes, marketing collateral, social posts, and landing pages. Where Nano Banana is good for one-off illustrations, Claude Design fills the gap when I need design output that holds together across multiple pieces. It can also import an existing design system as a starting point, which is where the “design tool” framing earns the name. One caveat: It is a research preview, so collaboration tools are basic and the design system import is hit and miss.

Supporting API Tools

This is a new category since January, introduced because the apps I’m developing currently need it. AI-integrated apps need data, and I use Brave Search for jobs that need to search the web, Jina.AI for reading websites cleanly, and Apify for to access tools for contact enrichment and platform-specific extractions. I’ve found that the most expensive part of building an agentic workflow often isn’t the AI model itself, it’s these tools (which act as connective tissue to get data in and out) that add significant cost.

Automation and Agentic Workflows

I have an OpenClaw instance running on an isolated Mac because of the autonomy concerns I covered earlier in this series where I highlighted the risks around running an autonomous agent on a machine with access to your everyday credentials.

The largest single addition to my stack since January is Claude Cowork. In plain terms, Cowork is a desktop AI agent that goes beyond chat to do work directly on your computer. You describe an outcome; it plans the steps, then reads, edits, and creates files in folders you’ve given it access to, hops between applications, and synthesizes information across sources. Before it acts on anything substantial, it shows you the plan and waits for approval. That plan-and-approval gate is what makes the autonomy usable rather than reckless.

I’m using Claude Cowork to manage all of the major projects I’m working on, and to run scheduled jobs including an inbox-and-calendar scan that drafts follow-up emails (drafts only; sending stays mine to authorize), a receipt monitor for accounting, and a billable-hours reconciliation against client work. I’m also using Dispatch, a feature that pairs the Claude mobile app to the desktop session so I can interact with Claude Cowork while away from my computer.

I’ve consolidated heavily on Anthropic’s tools (Claude Code, Cowork, Claude Design, the Claude browser plug-in) and the integration between them is a real part of why this stack works. I should note that the lesson isn’t “use Anthropic’s stack.” The lesson is that picking a coherent family and committing to it pays off in ways that running best-in-class on every axis doesn’t. The right family for someone else might be Microsoft, Google, or a different combination entirely.

Knowledge Organization

NotebookLM is still doing the per-project knowledge organization job as noted in my January post, and doing it well. I have one notebook per major project with all related documents loaded into it.

I am watching Cowork to see if its Projects feature will absorb this work, because the gap between a structured document corpus and a project context is narrowing. I fully expect to consolidate eventually, and probably sooner rather than later. NotebookLM still does the job of managing my project documents better today, but the gap is narrowing very quickly.

What’s Quietly Faded

Chatbots as a primary category has lost momentum for me. I still use them, but they’re a much smaller share of how I interact with AI than they were in January. The model has become the thing; the chat interface is one of several ways I reach it.

I was using the Comet browser for website automation tasks, but the work I was doing there has been replaced by the Claude browser plug-in for in-browser tasks and Cowork’s desktop automation for anything that crosses applications.

What Surprised Me

Local models on consumer hardware are good enough for a real subset of work — “the data has to leave my laptop” is no longer a default assumption. Cowork’s release pace has been the bigger story than any individual feature; it’s the first AI product I’ve used where the answer to “can it do X?” changes meaningfully from one month to the next, in the direction of yes.

The biggest mindset shift, though, was giving up on having the best tool in every category. I expected to feel like I was settling. In practice, the integrated stack saves me more time than any individual best-in-class tool ever did, and the predictability is its own form of productivity gain.

Closing

The stack has shifted from a list of products to an architecture with models at the bottom, supporting APIs above, automation and agentic platforms above that, and chat interfaces as one of several ways to drive the whole thing. I believe that shape will keep evolving, but this layered structure feels like it has staying power.

The other shift is harder to picture but more practically useful: Choose consistency over best-in-class. Models and tooling have improved rapidly over the last four months, and that pace isn’t slowing down. Chasing the “new best model” has a real cost (switching tooling, rebuilding integrations, learning new conventions) that is hard to quantify but is absolutely real.

For business leaders deciding what AI stack to invest in, I suggest that the model layer is becoming a commodity, the supporting infrastructure is consolidating, and the automation layer is where the differentiation lives. Select your own AI stack based on the automation layer that works for your use case, then stick with it long enough to let it mature before chasing the next thing.

Wrap Up

This post is part of a series on the current state of AI, focused on how it can be applied in practical ways to deliver measurable improvements in productivity, cost savings, and response times. If you’d like to explore more, all previous posts are available under Insights; please read them and reach out with any questions or comments you have.