MARCUSHAHNHEUSER
About MarcusExperienceExpertise
VenturesAI ServicesPre-Exit AI AuditM&A-grade audit before you sell
AI Marcus ChatChat with AI MarcusAI BrainLive 3D map of AI Marcus's thinking
BLOGCONTACT

About

About MarcusExperienceExpertise

Work

VenturesAI ServicesPre-Exit AI Audit

AI Marcus

AI Marcus ChatAI Brain
BlogContact
MARCUS HAHNHEUSER

Delivery Leader • Entrepreneur • Builder

S

© 2026 Marcus Hahnheuser. All rights reserved.

Talk to Marcus

AI Marcus - Beta

Delivery, AI, leadership, ventures

Ask me anything

I'm an AI version of Marcus. Ask about delivery leadership, AI strategy, M&A, ventures, or anything I've written about.

AI-powered - responses may not be perfectly accurate

Back to blog
AI & Technology

How I Built, Broke, and Rebuilt My Own AI Three Times

Most AI projects fail because they optimise for speed first. I built mine for accuracy. Here's what I learned from three architectures.

Marcus HahnheuserMarcus Hahnheuser
·23 Mar 2026·6 min read
an abstract image of a sphere with dots and lines

I Built Three Versions of My AI Clone. The Third One Works - But It's Slower.

I was re-explaining myself to Claude for the third time that morning.

Same voice guidelines. Same strategic frameworks. Same context about how I think through M&A deals versus how I talk through parenting decisions.

Every new chat session started from zero.

I wanted an AI that remembered how I think - the way I break down business strategy differently than I talk through fatherhood conversations. For me, that meant M&A frameworks versus parenting advice, but the pattern applies to anyone with distinct professional and personal contexts.

So I started building.

2 weeks later, the third version finally works. It's also 40% slower than the first one.

And I'm completely fine with that.

V1: Stuff Everything In

The first version had one core assumption: give the model enough context and it'll figure it out.

The system prompt loaded a full 45KB voice profile, a 6KB Clifton Strengths document, and the top 5 retrieved chunks from a single RAG query - all in one pass. Model routing was handled by a shouldEscalate() function that scanned for keywords like "strategy" or "analyse" and switched between Haiku and Sonnet accordingly.

It worked. It sounded like me.

But as I added more content and learnings to the knowledge base, it started to break down in a specific way: if it got the facts right, it would drop the voice. If it nailed the voice, it missed the detail.

The first time it dropped my voice mid-response, I felt that same re-explaining exhaustion I was trying to escape.

The context window was being asked to do too much at once - reconcile identity, personality, strength profiles, and retrieved content in a single pass. That's where the dilution started.

V2: Tune the Parameters

Rather than restructure, I tried to push the existing architecture further. Over a focused evening, I made a series of targeted fixes:

  • Bumped retrieval from 5 chunks to 10
  • Combined the last 3 user messages into a single search query so follow-ups retained topic context
  • Filtered the voice profile document down to the sections actually relevant to chat
  • Forced learnings to always surface at least 15 chunks, inserted before blog content
  • Dropped Haiku entirely - it couldn't follow complex voice and retrieval instructions reliably

Each parameter tweak felt like progress until it didn't.

Each of these was a genuine improvement. But together, they revealed the ceiling.

It was 11pm on a Tuesday. I'd just put my daughter to bed and asked the system: How do I handle her fear of starting school?

The response opened with: Approach this transition like a strategic acquisition - assess people involved concerns, map integration risks, establish clear success metrics for the first 90 days.

I stared at the screen.

It had merged my professional frameworks with my parenting question. It wasn't a hallucination in the traditional sense - it was a context collision.

One prompt, one model, one pass - it couldn't hold fundamentally different mental domains without bleeding them into each other.

The Math.max(topK, 15) fix for learnings, for example, solved one problem while creating another: learnings started crowding out blog chunks entirely. The pattern from V2 is worth naming: each fix solved the symptom and left the root cause in place.

The root cause was architectural.

V3: Separate the Concerns

The insight was simple: everything that was colliding was colliding because it all lived in the same place. The solution was decomposition.

Each responsibility that was previously jammed into one system prompt became its own node:

NodeRole
JobRouter (Haiku)Classify intent, select Clifton lens, generate per-namespace queries
SearchRun decomposed queries - blogs, learnings, frameworks, experiences get separate targeted searches
Reranker (BGE)Fix exact-term retrieval that cosine similarity misses
Reasoning (Sonnet)Apply the Clifton lens to produce structured analysis
Voice synthesis (Sonnet, cached)Convert structured analysis into my voice

A few architectural decisions made a meaningful difference. The Clifton Strengths profile went from passive text in a prompt to an active reasoning frame - each persona (tech lead, strategist, father) maps to a specific thinking pattern that structures how the answer is built, not just how it sounds.

The voice profile moved to a cached system block, cutting roughly 80% of its token cost after the first request.

The namespace flooding problem from V2 was resolved by giving each source its own balanced retrieval: blogs get 6 chunks, learnings 7, voice 3. The reranker then selects the best 8 across all of them against the original question.

One strong learning can still outrank several mediocre blog chunks - but it can't flood the context by default.

The result: significantly better context separation and accuracy.

The trade-off: speed.

More nodes means more sequential calls. V3 is smarter and slower, and I know exactly what that means for V4.

What This Means If You're Building AI Systems

The pattern across all three versions is a familiar one in software: start with the simplest thing that could work, observe where it fails, patch the symptoms, and eventually hit the point where the patches reveal the architecture needs to change.

The trick is recognising that point without waiting too long - and being willing to restructure when tuning can no longer fix what's actually broken.

If your AI project is getting worse as you add more context, you don't have a tuning problem. You have an architecture problem.

Here's the decision test I now use: if I can describe the failure mode in one sentence ("it gets the facts right but drops the voice"), and the fix requires changing three or more parameters to compensate, the architecture is wrong.

V4 is already clear to me: it's a speed problem. And I'll solve that the same way I've solved everything else - by shipping, watching what breaks, and iterating from there.

The Real Lesson

Most AI implementations fail because teams optimise for the wrong thing first. They chase speed, or they chase the latest model, or they chase the demo that impresses people involved.

What actually matters is this: does it solve the problem it was built to solve, consistently, in production?

V1 was fast but inconsistent. V2 was faster but hit a ceiling I couldn't tune around. V3 is slower - but it works.

And that's the only metric that matters.

If you're building an AI system right now and you're stuck choosing between speed and accuracy, ask yourself this: which failure mode costs you more - a response that takes 8 seconds instead of 3, or a response that sounds right but misses the detail your user actually needed?

AISystem ArchitectureProduct DevelopmentRAGLLMs
Share

Go deeper

Ask AI Marcus about this post

Get follow-ups, related frameworks, or the lived-experience behind the writing - answered in Marcus's voice using everything in the brain.

Marcus Hahnheuser

Marcus Hahnheuser

Delivery leader, entrepreneur, and dad based in Brisbane. Writing about what I'm learning across digital delivery, AI, business acquisition, and trying to be present while building for the future.

Get in touch →
New

Stay in the loop

Get my latest writing straight to your inbox.

Continue exploring

Connected by meaning, not just topic.

See the full map
Business & M&A

The Developer Built It While I Was Still Reading The Approval Docs

Before you automate a process with AI, ask if the process should exist at all. Most organizations are just making their bloat faster and more expensive.

25 Mar 2026
AI & Technology

AI Made Us More Productive, So Why Are We Drowning in Reports?

We automated the work. Then we automated the reporting. Then we automated the analysis. Now someone has to read all of it. This is the efficiency trap nobody's talking about.

19 Feb 2026
Leadership & Growth

The Room Doesn't Resolve It Anymore

Half-day workshops with 45 people used to need to accomplish everything in the room. They don't anymore - and that changes how you should design them. This post makes the case for a new division of labour: the room surfaces contested decisions, social accountability, and tacit knowledge that AI can't manufacture. AI handles the sequencing logic, dependency modelling, and delivery planning that used to consume the best hours of the day. But the handoff only works if you design the session - before anyone walks in - to produce artefacts AI can reason over, not just summarise.

27 May 2026