Opus 4.8 Plans, Gemini 3.5 Executes — I Sit in Middle

Susiloharjo

For the last six weeks I have been running my project work through a two-agent loop, and it has changed how I think about AI assistants. Opus 4.8 plans. Gemini 3.5 executes. I sit between them as the human in the loop, and the work gets faster and cleaner than any single-agent setup I have run before.

This is what the flow looks like, what each model is actually good at, and where the loop breaks when I push it too hard.

Read more

RAG Retrieval Is Filtering, Not Search.

Susiloharjo

I have been building RAG pipelines for two years. The mental model I started with was wrong, and reading Angela Shi’s article “Retrieval Is Filtering, Not Search” on Towards Data Science this week made the fix click.

The standard framing of RAG retrieval is “find the passages most similar to the query.” That framing is misleading because it imports the wrong mental model. Retrieval is not a Google-style search across unstructured text. It is a filtering problem on structured tables. The closer mental model is a SQL query, not a Google search.

This is the article that should have existed when I started. Here is what I learned, and what I am changing in my own RAG pipelines because of it.

Read more

AI Wrote 80% in 10 Minutes. The Last 20% Took 6 Hours.

Abstract dark coding keyboard representing AI-generated code

I shipped a feature on a Tuesday that took 11 minutes end-to-end. The agent generated the happy path, ran the tests, opened the PR. I clicked merge. Done before lunch.

The same agent shipped a feature on a Friday that took me 6 more hours after the agent finished. The happy path looked identical. The difference was the last 20%.

That gap is what this post is about.

Read more

Claude Code vs Cursor 2026: The Honest Comparison

Susiloharjo

SpaceX is reportedly buying Cursor for $60 billion. Anthropic is shipping Claude Code updates every two weeks. Every developer I know is asking the same question: which one should I actually use?

I spent the last 90 days shipping production code with both. Not toy projects. Not benchmarks. Real features, in a real codebase, with real deadlines. Here’s what each one is actually good at — and where they both fail you.

I’m not going to give you a feature table. You’re smart enough to read the docs yourself. What I am going to do is tell you what happened when I made each tool do real work.

Read more

My AI Coding Agent Kept Breaking — What I Changed

Susiloharjo

Six weeks ago, my AI coding agent was producing garbage. Not bad code — garbage. Functions that compiled but did nothing. Tests that passed for the wrong reasons. Refactors that introduced three bugs while fixing one.

I spent two days debugging the agent. Then I spent a week rebuilding it. Then I realized the problem wasn’t the agent.

The problem was me.

This is the story of what I changed. Not the agent — me.

Read more