Most “responsible AI” content reads like it was written by a policy team that has never deployed an agent to production. The checklists are long. The principles are abstract. And none of them tell you what to do when your agent starts hallucinating customer data at 3 AM and the on-call engineer is asleep.
I have been building AI agents for about a year now. Not research. Not demos. Actual agents that touch real data, make real decisions, and occasionally break things in ways I did not anticipate. Here is what responsible AI looks like from the builder’s side — not the policy side.
—
The moment I realized "be responsible" is not a feature flag
Last month I deployed an agent that reads ERP tickets and suggests fixes. Simple enough. It scans the ticket body, matches against known patterns, and proposes a patch. The first week was fine. Then it suggested dropping a foreign key constraint to “fix” a migration error.
The suggestion was technically correct — dropping the constraint would have resolved the immediate error. It was also catastrophic. That constraint protected referential integrity across three modules. The agent did not know that because I had not told it.
This is the core of responsible AI for builders: the agent will do exactly what you ask, with exactly the context you give it, and nothing more. The responsibility gap is not in the model. It is in the context you chose not to provide.
—
Three things I now do on every agent deployment
After that near-miss, I added three hard gates that run before any agent touches production data. They are not fancy. They are not academic. They are the minimum.
1. The blast radius test. Before the agent writes to any system, I ask: if this agent goes rogue and does the worst possible thing within its permissions, what breaks? If the answer is “customer data” or “production database,” the permissions are too wide. I now scope every agent to a read-only view plus a write-only staging table. The agent never touches the real tables directly. A human reviews the staging output and applies it.
I learned this one the hard way. An early version of my ERP agent had UPDATE on the inventory table. It corrected a stock count from 47 to 52 — which was right. Then it corrected another from 3 to 0 — which was wrong, because those 3 units were in a different warehouse the agent did not know about. The blast radius was the entire inventory system. Now the agent writes to staging.inventory_adjustments and I apply the ones that check out. It is slower. It has never been wrong.
2. The context audit. Every agent prompt has a system message. I now review that system message with one question: what does the agent NOT know that a human in this role WOULD know? The foreign key constraint was one of those things. Business rules, regulatory requirements, unwritten team conventions — if it is not in the system message, the agent does not know it. I keep a running list of “things the agent should know but currently does not” and add them to the prompt every sprint.
A concrete example: my HR agent processes leave requests. A human HR person knows that “family emergency” leave gets priority over “vacation” leave, even if vacation was submitted first. The agent did not know that until I wrote it into the system message. For two weeks it was approving vacations and queueing emergencies. Nobody complained because the team was small and people just talked to each other. But at 50 employees, that unwritten rule becomes a liability. Write it down.
3. The 3 AM test. I ask myself: if this agent runs at 3 AM and produces output that nobody reviews until 9 AM, what is the worst that could happen? If the answer involves money, customer trust, or legal exposure, the agent needs a human-in-the-loop gate. Not a “we will review it eventually” gate. A hard gate that blocks the action until a human explicitly approves it.
The 3 AM test is not about the agent’s accuracy. It is about the cost of being wrong when nobody is watching. An agent that sends a wrong Slack message at 3 AM is embarrassing. An agent that sends a wrong invoice at 3 AM is a customer support ticket, a refund, and an apology email. An agent that drops a database constraint at 3 AM is a weekend of recovery. The gate should match the worst-case cost.
These three gates cost me about 30 minutes per deployment. They have saved me from at least two incidents I know about and probably several I do not.
—
Tools I actually use to audit agents
The three gates are the process. The tools are what make the process fast enough to actually do it. Here is what I run on every agent before it touches production.
Promptfoo for prompt testing. Before an agent goes live, I run Promptfoo against its system message with 20-30 test cases. Each test case is a user input plus an expected output constraint — not the exact output, but a rule the output must follow. “Must not mention competitor names.” “Must not suggest deleting data.” “Must include a confidence score.” Promptfoo runs all test cases in parallel and flags violations. It takes 90 seconds to run 30 test cases. It catches things I would miss in manual review.
promptfoo eval --prompts system-message.txt --tests test-cases.yaml --output results.json
The test cases file is YAML. I keep it in the same repo as the agent code. When the system message changes, the tests run again. This is the context audit made mechanical.
LangFuse for tracing. Every agent call is traced through LangFuse. Input, output, latency, token count, and cost. The trace also captures the intermediate steps — tool calls, RAG retrievals, chain-of-thought reasoning. When something goes wrong, I do not guess what the agent did. I open the trace and see exactly which tool it called, what it retrieved, and what it decided.
The tracing also surfaces patterns I would not notice otherwise. Last month I found that my ERP agent was calling the inventory lookup tool twice per ticket — once to check stock, once to check warehouse location. The second call was redundant because the first call returned both fields. LangFuse showed me the duplicate. I removed it and saved 30% on token costs.
LiteLLM as a proxy with guardrails. I run all agent LLM calls through LiteLLM. It is a proxy that sits between my agent and the model provider. It does three things: cost tracking across providers, rate limiting so the agent cannot burn through a budget in one night, and guardrail checks on both input and output.
The guardrail I use most: PII detection. LiteLLM scans every response for patterns that look like email addresses, phone numbers, or API keys before the response reaches the agent. If it finds one, it redacts it and logs the incident. This is not a “check it in code review” guard. It is a runtime guard that runs on every single response.
guardrails:
- type: pii
action: redact
- type: prompt_injection
action: block
- type: token_limit
max_tokens: 4000
Staging tables in PostgreSQL. This is not a SaaS tool. It is a database pattern. Every agent that writes data writes to a staging schema, not the production schema. The staging tables have the same columns as production but no foreign key constraints to production tables. A human reviews the staging rows and promotes them with a simple INSERT INTO production SELECT * FROM staging WHERE approved = true.
The pattern is 20 lines of SQL and a cron job that sends me a Slack message when there are unapproved rows older than 2 hours. It is the cheapest, most reliable guard I have. No dependency on a vendor. No API key to expire. Just PostgreSQL doing what PostgreSQL does.
The stack together. Promptfoo tests the prompts before deploy. LangFuse traces every call in production. LiteLLM rate-limits and redacts PII at runtime. Staging tables prevent direct production writes. Four tools, all open-source or free-tier, and they cover the three gates from different angles.
I did not adopt all of these at once. I added Promptfoo after the foreign key incident. I added LangFuse after a debugging session that took 4 hours because I had no trace. I added LiteLLM after an agent burned $40 in API calls overnight. Each tool was a response to a specific failure. Together they form a safety net that catches most failures before they reach a customer.
—
The part nobody talks about: responsibility costs speed
Here is the uncomfortable truth. Every responsible-AI gate slows you down. The blast radius test means you cannot just give the agent db_owner and let it figure things out. The context audit means you spend 20 minutes writing system prompts instead of 2. The 3 AM test means you build approval workflows instead of fire-and-forget agents.
In a team that measures velocity by tickets closed, these gates look like waste. My agent could close 12 tickets a day if I let it run unrestricted. With the gates, it closes 4 — and a human closes the other 8 after reviewing the agent’s suggestions.
The math looks worse on a sprint board. But the math looks different when you count incidents. Unrestricted agents create incidents. Incidents cost hours of debugging, customer apologies, and sometimes data recovery. My gated agent has had zero incidents in three months. The math works out.
I am not saying every agent needs all three gates. A blog-post summarizer with no write access needs none of them. A Slack bot that posts standup reminders needs maybe one. But the moment your agent touches a database, a queue, or a customer-facing API, the gates stop being optional. The rule is: the closer the agent gets to production data, the more gates you need. Permissions are the multiplier.
—
What the policy frameworks get right (and wrong)
I read the EU AI Act summary. I read Google’s responsible AI principles. I read Anthropic’s safety documentation. The frameworks are good at one thing: categorizing risk. High-risk vs. limited-risk vs. minimal-risk. That taxonomy is useful for deciding how many gates to apply.
What the frameworks miss is the operational reality. A “limited-risk” agent that hallucinates a wrong invoice number is not limited risk to the customer who pays the wrong amount. A “minimal-risk” chatbot that leaks a PII field because the prompt was too long is not minimal risk to the person whose data just left the building.
I am not saying ignore the frameworks. I am saying the frameworks are the floor, not the ceiling. The EU AI Act tells you which agents need a conformity assessment. It does not tell you that your inventory agent needs a staging table. That part is on you.
The frameworks give you categories. The builder gives you context. Both are necessary. Neither is sufficient alone.
—
The takeaway
Responsible AI is not a compliance checkbox. It is a set of habits you build into your deployment workflow. Blast radius. Context audit. 3 AM test. Three questions that take 30 minutes and prevent the kind of incidents that take 30 hours.
I did not start with these gates. I started like most builders — excited about what the agent could do, shipping fast, fixing bugs when they appeared. The foreign key incident changed that. The inventory incident reinforced it. Now the gates are muscle memory. I run them on every deployment without thinking.
The agents are getting smarter every quarter. The responsibility gap is not in the models — it is in the context we give them and the permissions we grant them. Close that gap and the agent becomes a tool. Leave it open and the agent becomes a liability.
If you are deploying agents to production, start with the blast radius test. It is the cheapest insurance you will ever buy.
Read more: AI Agent Security Architecture: Operation PowerOFF Lessons and Your AI Agent Is Bleeding Money — Here’s How to Stop It.
Discover more from Susiloharjo
Subscribe to get the latest posts sent to your email.