Prompts Are Code Now: My Claude Opus 4.8 Playbook

June 2, 2026 · 6 min read

I’ve been writing prompts for LLMs almost daily for the past year — Hermes agents, Claude Code sessions, quick experiments, the works. And for most of that time, I treated prompts like magic incantations. More adjectives, more pleading, more “please be amazing.”

It worked sometimes. But not reliably.

Then I came across Lina Beliūnas’s breakdown of the Claude Opus 4.8 prompting playbook, and something clicked: prompts aren’t wishes. They’re code.

You don’t write a function that sometimes returns the right answer and call it a day. You structure it, constrain its inputs, define its output contract, and set the right resource allocation. Same thing with prompts. Here’s what that looks like in practice.

The Effort Level Changed Everything

This single variable made the biggest difference in my output quality, and I’d never touched it.

Opus 4.8 has five effort levels: low, medium, high (default), xhigh, and max. The default is high, which works fine for simple tasks — a quick refactor, a one-shot question. But the moment I bumped complex sessions to xhigh, the depth of reasoning changed noticeably.

Here’s the practical rule I now use: if I see shallow reasoning on a complex task, I raise the effort level before changing the prompt. Nine times out of ten, that’s the fix. Not more words. Not a better persona. Just more compute allocated to thinking.

For Hermes users like me: you can set effort in your provider config. For Claude Code: enable ultracode mode during focused engineering sessions — it triggers Dynamic Workflows automatically without having to ask.

XML Tags Are Claude’s Superpower

This sounds silly until you try it. Claude was specifically trained to recognize XML tags as structural markers. When you wrap context in <context> tags and instructions in <instructions> tags, Claude treats them differently.

My prompts used to be a wall of text. Now they look like this:

<context>
I'm debugging a PostgreSQL query that's 40x slower in staging than in dev.
Same schema, same data volume.
</context>

<instructions>
Analyze the three most likely causes. For each,
explain how to verify and fix.
</instructions>

<constraints>
- No hedge words. Give me your best diagnosis.
- Assume I'm running Postgres 15 on Ubuntu.
- Max 300 words.
</constraints>

The difference is immediate. Claude stops mixing up background info with actual tasks. It treats <constraints> as guardrails, not suggestions.

Persistent agent memory via MCP takes this same principle further — your agent remembers past preferences and patterns across sessions, building on experience instead of starting fresh every time.

Show, Don’t Describe

Here’s something I noticed long before reading the playbook, but now I understand why it works: showing Claude two examples of what I want beats a paragraph of description.

Want a concise response? Don’t say “be concise.” Show a one-paragraph example of the exact tone and length you want. Claude pattern-matches against examples far more reliably than it follows abstract instructions.

I now keep a small library of example outputs in my Hermes skills. When I start a new task, I paste one or two relevant examples first, then my actual query. The consistency improvement is dramatic.

Dynamic Workflows: The New Paradigm

This is the biggest shift in the Opus 4.8 era, and it’s Claude Code-specific.

Dynamic Workflows let Claude write its own orchestration scripts — spinning up tens or hundreds of parallel subagents instead of making tool calls one at a time. The control flow is code, which means Claude won’t drift or forget halfway through a large refactor.

The flagship example: Jarred Sumner (creator of Bun) used Dynamic Workflows to rewrite the entire Bun runtime from Zig to Rust — 750,000 lines of code, eleven days, 99.8% test pass rate. Hundreds of agents working in parallel, each file reviewed by two adversarial judges, a fix loop driving the build until everything passed.

I’ve started using this pattern for codebase-wide changes. The workflow that used to take me three manual sessions over two days now runs in a single session with parallel agents.

Cost warning: workflows burn tokens fast. I ask Claude to estimate token usage before running a large workflow, and I start with a 10% sample before committing to the full codebase.

What I Actually Use Day-to-Day

Here’s my condensed playbook for daily work:

Start with high effort. Only bump to xhigh if the reasoning feels shallow.
Structure every prompt with <context>, <instructions>, and <constraints> tags.
Show 1-2 example outputs when tone or format matters.
Enable ultracode for coding sessions, disable it for quick asks.
Start small, scale up. Run workflows on a sample first.

That’s it. Five rules. They’ve cut my re-prompt rate by at least half.

The Takeaway

The biggest lie in the AI tools space is that prompting is an art. It’s not. It’s engineering. Define your inputs, set your parameters, structure your context, validate your outputs. The same discipline that makes your Python code reliable makes your prompts reliable — and that discipline extends to how you structure agent skills for your AI dev tools.

Opus 4.8 rewards structure. Give it that, and it genuinely delivers some of the best output I’ve seen from any model.

Discover more from Susiloharjo

Subscribe to get the latest posts sent to your email.