Post 4: Building the Agent Team — Supervisor, Coder, Reviewer, QC
One agent is an assistant. A team of agents is a department.
In Post 2 and 3, we set up Hermes (the orchestrator) and Claude Code (the coder). Now we architect the full team — four specialized agents that work together like a real dev team, with a human-in-the-loop making the final call.
Why Multiple Agents?
Here’s the problem with one-agent-does-everything: context pollution. When Claude reads 20 files to understand a bug, writes a fix, reviews its own code, and runs tests — all in one session — its context window fills up. By step 4, it’s operating on summaries of summaries. Quality degrades.
Cat Wu from the Claude Code team said it plainly: “The model performs best if you treat it like an engineer you’re delegating to, not a pair programmer you’re guiding line by line.”
The solution: separate concerns. Each agent has one job, one context, one set of tools.
The Team Architecture
The flow:
1. Supervisor (Hermes) detects new GitHub issue, reads it, and decides: is this a code bug, feature request, or user question?
2. If code bug, Coder (Claude Code) investigates, writes failing test, fixes, creates PR
3. Reviewer (Claude subagent, separate session) reviews the PR with zero implementation bias
4. QC/Tester runs the full test suite, checks edge cases, verifies nothing breaks
5. Human (you) gets a clean PR with review notes and test results, then approve or request changes
You only touch this at step 5. Steps 1-4 are fully autonomous.
Agent 1: Supervisor (Hermes Agent)
The supervisor doesn’t write code. It manages the queue. Hermes has native cron, Telegram integration, and delegation. The supervisor is a lightweight decision-maker, not a heavy code-writer.
Agent 2: Coder (Claude Code)
This is the workhorse. It gets a ticket, investigates, fixes, creates PR. Uses Claude Sonnet with –allowedTools Read,Edit,Bash, –max-turns 15, –max-budget-usd 1.00 for routine bugs.
Agent 3: Reviewer (Claude Subagent)
The reviewer runs in a separate context with zero implementation bias. It doesn’t know what the Coder was thinking. It just sees the diff. Uses read-only tools to avoid bias toward defending its own edits.
Agent 4: QC/Tester (Claude Code)
The final gate before human review. Runs the full test suite, checks edge cases. Uses Claude Haiku since testing is cheaper than fixing.
Human-in-the-Loop: Where You Step In
You only get involved at the final stage. You see a Telegram notification with PR link, cost, coder turns, review verdict, and QC status. Your options: Merge, Request changes, or Manual review.
Cost Per Ticket Breakdown
| Agent | Model | Avg cost |
|——-|——-|———-|
| Supervisor | DeepSeek V4 Pro | ~$0.01 |
| Coder | Claude Sonnet | ~$0.25 |
| Reviewer | Claude Sonnet | ~$0.10 |
| QC/Tester | Claude Haiku | ~$0.03 |
| Total | | ~$0.39/ticket |
A junior developer costs ~$25/hour. This costs ~$0.39 per ticket and works 24/7.
What We Just Built
- Supervisor that classifies and routes tickets
- Coder that reproduces, fixes, and creates PRs
- Reviewer that audits with zero implementation bias
- QC that runs full test suite + edge cases
- Human-in-the-loop for final approval
- Escalation rules for critical paths
Next post: [Post 5] Autonomous Cron Pipeline — wiring it all together. The cron job that makes this run while you sleep.
Series: “AI Junior Developer” — Part 4 of 6
Related: Post 3: Setting Up Claude Code — The Brain Behind the Agent.
Related: Alibaba’s AI Just Coded for 35 Hours Straight Without Human Help.
Discover more from Susiloharjo
Subscribe to get the latest posts sent to your email.