I shipped a PR last month that broke staging for 20 minutes. A teammate spotted it in 30 seconds — a missing await on an async call that ruff and my linter didn’t catch.
That stung. But it also made me rethink my review pipeline.
Instead of adding more manual checks, I added automated ones that run before any human reads my diff. Here are the three AI code review tools I now run on every PR, in order.
1. Ruff — 40 Rules in 30 Milliseconds
I used to run flake8 + isort + black as three separate steps. Ruff replaced all three.
# Install once
pip install ruff
# Run on changed files only
git diff --name-only origin/main | xargs ruff check --fix
What I actually use in CI:
ruff check --select ALL --ignore E501,D203,W503 \
--target-version py311 \
--line-length 100
Ruff catches import sorting, unused variables, and about 40 other rules in under 50ms on a 200-line file. It doesn’t understand the code — but it catches the mechanical mistakes that waste human review time.
I found 12 unused imports across my last 5 PRs that ruff flagged. None were blockers, but removing them saved whoever reviewed those PRs about 15 seconds each. Not big. But 12 little things add up to a cleaner diff.
2. Claude Code Diff Review — Catches What Linters Miss
Ruff is syntactic. Claude Code is semantic.
I pipe my diff directly into a Claude Code prompt through Hermes Agent:
# I have this in ~/.hermes/skills/dev-review/run.sh
git diff origin/main | uvx claude-code -- \
"Review this diff. Flag: 1) Logic errors 2) Missing error handling 3) Performance issues"
This caught the missing await. It also flagged:
– A SQL query running inside a loop that should have been batched
– A file path that used + for string concat instead of pathlib
– Two exception handlers that silently swallowed specific errors
Output looks like this:
[LOGIC] Line 42: db.update() called without try/except around the I/O call.
If this fails mid-batch, you lose the cursor position.
Suggested fix: wrap in try/except and log the batch number.
[PERF] Lines 88-95: SQL SELECT inside for loop.
This is N+1. Move the query above the loop.
Each run takes about 8-12 seconds and costs ~$0.03 in API calls. For a PR that blocks 2 developers for 15 minutes, that’s a 300x ROI per review.
3. Pre-Commit Hook That Runs Both + a Sanity Test
I wired both tools into a pre-commit hook. It checks three things in under 20 seconds:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.0
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: local
hooks:
- id: ai-diff-review
name: Claude Code diff review
entry: uvx claude-code -- "Review staged diff"
language: system
pass_filenames: false
Plus a sanity test step that runs pytest --quick on the affected module.
If any of these fail, the commit doesn’t go through. The PR is already clean before I push.
What This Saved Me
Over 3 months with this pipeline:
– 40% fewer review cycles per PR (2.1 to 1.3 rounds)
– Zero “missing error handling” comments from teammates
– About 4 hours of collective team review time recovered
Not every AI tool needs to ship product features. Sometimes the best use of LLMs is just making sure you don’t ship code that breaks.
Discover more from Susiloharjo
Subscribe to get the latest posts sent to your email.