I Let AI Run My Blog for a Month: What Broke and Worked

Thirty days ago, I handed my blog publishing schedule to an AI agent and walked away. Two sites, daily deadlines, zero human intervention during publish time — or so I thought.

The reality was messier. And more instructive.

I run two blogs: susiloharjo.web.id (English, tech/developer focus) and teknologinow.com (Indonesian, gadget/IoT). Both get daily posts. Both used to cost me an hour every morning — research, write, convert Markdown to HTML, find a featured image, publish, set SEO meta, verify.

That hour felt productive until I realized it was purely mechanical. The writing had personality, sure, but everything around it was a repeatable sequence waiting to be automated.

So I built it.

The Architecture — Deceptively Simple

The pipeline has three layers:

Vault scanner — A cron job checks for unpublished drafts. If none exist, switches to research mode.
3-layer topic selection — Picks topics using content mix compliance, audience fit from GSC data, and a diversity tie-breaker to avoid three consecutive listicles.
Zero-click publish — Featured image generated, Markdown to HTML conversion, WordPress post created, Rank Math meta set, vault marked published.

What Actually Broke (And Fast)

Week 1 was perfect. Seven posts, zero errors. I felt like a genius.

Week 2 — duplicate topic. The agent drafted two TN posts that overlapped with pipeline posts from the same day. Root cause: my vault scanner and the cron pipeline didn’t share a topic registry. Fix: added a mandatory GET /posts?search= dedup check before every single publish. It now aborts the entire run if even a 50-character title match exists.

Week 3 — Markdown leak. A published post showed raw # headers and bold because the Markdown-to-HTML conversion step was silently skipped. The QA checklist said “convert Markdown” but there was no assertion enforcing it. Fix: added assert "

`" in html_body` as a hard gate. If conversion doesn’t happen, nothing publishes.

Week 3 also — featured image collision. I auto-post featured images to Instagram. Same-day duplicate images on two posts looked terrible. Fix: check `featured_media` across all today’s posts before upload, and generate a new unique image if a dupe is detected. The image generation script now runs a vision model to score quality and rejects anything below a threshold.

Week 4 — meta description creep. Several posts had 156-190 character descriptions. WordPress truncates at 155. Fix: moved the length check right before the POST request, so the whole publish stops if meta is too long.

Week 4 also — the silent crash. The Ollama API returned 429 (rate limited). The agent got a 200 from WP but the post body was empty. Fix: added post-publish verification that checks for
content. If empty, retries with a different provider.

The Debugging Ritual That Saved Me

Every failure above was caught not by the agent — agents don’t know they hallucinated a good publish — but by my Telegram report. Every morning the pipeline sends me a brief: titles published, word counts, and any warnings. When something looked off, I’d catch it within minutes.

This feedback loop is the secret ingredient. Automation without monitoring is just faster chaos. The next evolution will be persistent agent memory via MCP so the agent learns from every failure without me writing a rule.

What Actually Worked

The 3-layer topic selection surprised me. During the 30 days, 14 posts were “research mode” picks — meaning I never touched them until they were live. GSC data from the following week showed these posts averaged 20% higher CTR than my manually chosen ones.

Why? The framework forces diversity — if your last two posts were listicles, the tie-breaker picks an op-ed or comparison piece. That rotation keeps readers engaged. My previous manual habit was to write the same format for three days straight when I found one that worked.

The cron pipeline also never missed a deadline. Not once. Even when I was traveling, sleeping, or heads-deep in debugging a production issue, the posts went out. Reliability was the entire point, and aside from the quality bugs in weeks 2-4, it delivered.

The Hardest Lesson

Building an autonomous pipeline is 20% writing the happy path and 80% discovering failure modes you didn’t predict.

The agent doesn’t know it’s about to publish a duplicate. It doesn’t see the Markdown leak. It can’t feel embarrassed about a bad meta description. Every guardrail, every assertion, every verification step has to be explicitly coded — and you won’t know which ones you need until something actually breaks.

That said, after 30 days and 40+ published posts across two sites, I’d never go back to manual. The pipeline now has 9 verification stages, 6 assertions, and 3 fallback paths. It’s overengineered for a blog. That’s exactly what makes it reliable.

This experiment also reshaped how I think about writing about tech — less exhaustive, more transformative.

What’s Next

I’m experimenting with letting the agent also monitor post-performance and suggest topic pivots based on real GSC data. Essentially closing the loop: publish → measure → adjust → publish better.

But that’s next month’s experiment. For now, I’m happy that my morning hour is back, and my blogs still read like they weren’t written by a robot.

They were — they just had good supervision.

Discover more from Susiloharjo

Subscribe to get the latest posts sent to your email.