How I Built, Broke, and Rebuilt My Own AI Three Times
Most AI projects fail because they optimise for speed first. I built mine for accuracy. Here's what I learned from three architectures.
I Built Three Versions of My AI Clone. The Third One Works - But It's Slower.
I was re-explaining myself to Claude for the third time that morning.
Same voice guidelines. Same strategic frameworks. Same context about how I think through M&A deals versus how I talk through parenting decisions.
Every new chat session started from zero.
I wanted an AI that remembered how I think - the way I break down business strategy differently than I talk through fatherhood conversations. For me, that meant M&A frameworks versus parenting advice, but the pattern applies to anyone with distinct professional and personal contexts.
So I started building.
2 weeks later, the third version finally works. It's also 40% slower than the first one.
And I'm completely fine with that.
V1: Stuff Everything In
The first version had one core assumption: give the model enough context and it'll figure it out.
The system prompt loaded a full 45KB voice profile, a 6KB Clifton Strengths document, and the top 5 retrieved chunks from a single RAG query - all in one pass. Model routing was handled by a shouldEscalate() function that scanned for keywords like "strategy" or "analyse" and switched between Haiku and Sonnet accordingly.
It worked. It sounded like me.
But as I added more content and learnings to the knowledge base, it started to break down in a specific way: if it got the facts right, it would drop the voice. If it nailed the voice, it missed the detail.
The first time it dropped my voice mid-response, I felt that same re-explaining exhaustion I was trying to escape.
The context window was being asked to do too much at once - reconcile identity, personality, strength profiles, and retrieved content in a single pass. That's where the dilution started.
V2: Tune the Parameters
Rather than restructure, I tried to push the existing architecture further. Over a focused evening, I made a series of targeted fixes:
- Bumped retrieval from 5 chunks to 10
- Combined the last 3 user messages into a single search query so follow-ups retained topic context
- Filtered the voice profile document down to the sections actually relevant to chat
- Forced learnings to always surface at least 15 chunks, inserted before blog content
- Dropped Haiku entirely - it couldn't follow complex voice and retrieval instructions reliably
Each parameter tweak felt like progress until it didn't.
Each of these was a genuine improvement. But together, they revealed the ceiling.
Go deeper
Ask AI Marcus about this post
Get follow-ups, related frameworks, or the lived-experience behind the writing — answered in Marcus's voice using everything in the brain.

Marcus Hahnheuser
Delivery leader, entrepreneur, and dad based in Brisbane. Writing about what I'm learning across digital delivery, AI, business acquisition, and trying to be present while building for the future.
Get in touch →