I Set 2 Playwright Goals at 4 AM and Both Beat Fajr

14 June 202613 June 2026 by susiloharjo

Browser window with Python Playwright code on a desk in the dim light of pre-dawn, a closed Quran beside the keyboard

I Set 2 Playwright Goals at 4 AM and Both Beat Fajr

This Saturday morning I woke up at 4 AM with the kind of half-baked idea that only survives a brain that hasn’t fully booted. I wanted to see if I could give an LLM a real browser three AI code review tools I run before every PR, a goal written in two sentences, and walk away. Not a toy. Not a sandbox. LinkedIn. Tokopedia. The real web, with logins and JavaScript and the kind of DOM that makes scrapers cry learn how rogue AI processes can eat your server.

By the time I finished praying Fajr and closed my Quran, both jobs were done. One returned 23 LinkedIn profiles as structured JSON. The other returned 14 G-Shock listings under Rp 1.000.000 from Jakarta sellers offering instant payment. I had not touched the keyboard.

This is the prompt pattern that made it work — and the three things I almost got wrong.

Why a Simple If-Else Can Beat an LLM

12 June 2026 by susiloharjo

Code logic: simple if-else flow chart contrasting with LLM API endpoint

Why a Simple If-Else Can Beat an LLM

TL;DR: When you can describe the inputs and the expected outputs in advance, you don’t need a model — you need a function. Here’s the principle, the proof, and the one case where the principle breaks.

A teammate burned $47 of API credits last quarter on a “smart” classifier. The job: sort incoming support emails into four buckets (billing, technical, account, other) and route them to the right Slack channel. The model nailed it about 91% of the time. The remaining 9% it was confidently, hilariously wrong — sending a billing dispute to the technical channel, an outage report to “other.”

I replaced it with a 40-line Python script using if and a handful of keyword checks. It runs in 12 milliseconds per email, costs $0, and gets the same 91% — except the 9% it gets wrong are predictably wrong, so we know to watch them. The classifier used to hallucinate categories that didn’t exist. The script never invents a fifth bucket.

That’s not an edge case. That’s the principle: when the parameters are already known, deterministic code is the right answer. The question is why this works, and when it stops working.

Anthropic Shipped Two New Models. They’re the Same Model.

12 June 202611 June 2026 by susiloharjo

TL;DR: Anthropic launched Claude Fable 5 (general availability) and Claude Mythos 5 (restricted, cyber defenders only) on June 9, 2026. Same underlying model. Different safety posture. $10/M input tokens and $50/M output — less than half the price of Mythos Preview. Part 1 of 2: specs and use cases. Part 2 will be the review after I have run something real on it.

I opened Anthropic’s newsroom on Tuesday and saw two new flagship models. Then I read the second paragraph and realized they were the same model.

That is the structural fact nobody is writing about. The release notes, the benchmark table, the customer quotes — all of that is downstream of one decision Anthropic made: a single frontier model, packaged twice. Once for everyone. Once for the people with a clear reason to be dangerous.

The packaging is the product. Let me walk you through what’s actually shipping, what the safeguards mean in practice, and what I want to test once the access thaws.

Design Thinking Is 80% Theater. Here’s the 20% That Works.

14 June 202611 June 2026 by susiloharjo

Last quarter I ran a design thinking sprint on an AI agent project discover why prompts are code now in my Claude Opus playbook. Three weeks in, the only thing I’d produced was a wall of Post-it notes, two empathy maps, and a definition statement nobody on the engineering team could repeat read the post that changed how I write about tech. The agent itself had not moved one line of code forward.

Then I threw out 80% of the framework and kept the 20% that actually shipped the project.

Design thinking, stripped of consultant-speak, is a debugging loop for the gap between “what we think the user needs” and “what the user actually needs.” Most of what gets taught in corporate workshops is theater. The 20% that matters is something engineers have been doing for decades under a different name. They called it “writing tests against user behavior” or “asking the customer before shipping.”

This post is the 20%.

When AI Agents Eat Your Server: Taming Rogue Processes

6 June 2026 by susiloharjo

Linux server monitoring CPU and memory usage

When AI Agents Eat Your Server: Taming Rogue Processes I was debugging a CI pipeline when Nginx stopped responding. htop showed the culprit: a Python script spinning at 99.8% CPU, and another process that had swallowed 6GB of RAM before the kernel gave up. The script was an experimental AI agent stuck in a loop … Read more

Sometimes You Still Need a Human on the Other End

6 June 20263 June 2026 by susiloharjo

I spent last week migrating our payment gateway from Xxxxxt to Dxxu. Not because of pricing. Not because of features. Because when I needed help moving from sandbox to production, nobody on their end could give it. Here’s what happened. I had everything running fine in the sandbox environment — webhooks, callbacks, settlement flows, all … Read more