How I Built, Broke, and Rebuilt My Own AI Three Times
Most AI projects fail because they optimise for speed first. I built mine for accuracy. Here's what I learned from three architectures.
I Built Three Versions of My AI Clone. The Third One Works - But It's Slower.
I was re-explaining myself to Claude for the third time that morning.
Same voice guidelines. Same strategic frameworks. Same context about how I think through M&A deals versus how I talk through parenting decisions.
Every new chat session started from zero.
I wanted an AI that remembered how I think - the way I break down business strategy differently than I talk through fatherhood conversations. For me, that meant M&A frameworks versus parenting advice, but the pattern applies to anyone with distinct professional and personal contexts.
So I started building.
2 weeks later, the third version finally works. It's also 40% slower than the first one.
And I'm completely fine with that.
V1: Stuff Everything In
The first version had one core assumption: give the model enough context and it'll figure it out.
The system prompt loaded a full 45KB voice profile, a 6KB Clifton Strengths document, and the top 5 retrieved chunks from a single RAG query - all in one pass. Model routing was handled by a shouldEscalate() function that scanned for keywords like "strategy" or "analyse" and switched between Haiku and Sonnet accordingly.
It worked. It sounded like me.
But as I added more content and learnings to the knowledge base, it started to break down in a specific way: if it got the facts right, it would drop the voice. If it nailed the voice, it missed the detail.
The first time it dropped my voice mid-response, I felt that same re-explaining exhaustion I was trying to escape.
The context window was being asked to do too much at once - reconcile identity, personality, strength profiles, and retrieved content in a single pass. That's where the dilution started.
V2: Tune the Parameters
Rather than restructure, I tried to push the existing architecture further. Over a focused evening, I made a series of targeted fixes:
- Bumped retrieval from 5 chunks to 10
- Combined the last 3 user messages into a single search query so follow-ups retained topic context
- Filtered the voice profile document down to the sections actually relevant to chat
- Forced learnings to always surface at least 15 chunks, inserted before blog content
- Dropped Haiku entirely - it couldn't follow complex voice and retrieval instructions reliably
Each parameter tweak felt like progress until it didn't.
Each of these was a genuine improvement. But together, they revealed the ceiling.

Marcus Hahnheuser
Entrepreneur, Investor & Strategist based in Brisbane, Australia. Building businesses, scaling through M&A, and sharing insights on leadership, AI, and life.
Get in touch →