Human-in-the-Loop Security for AI Agents
By Kvlar Team
AI agent security has a spectrum problem. On one end: let the agent do anything. On the other: block everything that isn't explicitly allowed. Most real deployments need something in between.
That's where human-in-the-loop approval comes in.
The three-decision model
Kvlar's policy engine produces one of three decisions for every tool call:
- Allow — the action is safe, forward it immediately
- Deny — the action is dangerous, block it unconditionally
- Require approval — the action might be fine, but a human should decide
The third decision is the interesting one. It acknowledges that some actions are context-dependent. A DELETE FROM users WHERE id = 5 might be perfectly reasonable during a debugging session but catastrophic in production. The policy can't always know — but a human reviewer can.
Designing approval rules
Good approval rules target the ambiguous middle ground. Here's a practical pattern for a Postgres MCP server:
rules:
# Unambiguously dangerous — always block
- id: deny-destructive
match_on:
resources: ["query"]
parameters:
sql: "(?i)\\b(DROP|TRUNCATE|ALTER)\\b"
effect:
type: deny
reason: "Destructive DDL is never allowed"
# Unambiguously safe — always allow
- id: allow-reads
match_on:
resources: ["query"]
parameters:
sql: "(?i)^\\s*SELECT\\b"
effect:
type: allow
# Context-dependent — route to a human
- id: approve-mutations
match_on:
resources: ["query"]
parameters:
sql: "(?i)\\b(INSERT|UPDATE|DELETE)\\b"
effect:
type: require_approval
reason: "Data mutations require human review"
The key insight: deny what you can, allow what you must, and route everything else to a human. This maximizes safety without making the agent useless.
Webhook integration
Kvlar v0.3.0 ships webhook-based approval routing. When a require_approval decision fires, the proxy can POST the approval request to your endpoint. Your system reviews it (Slack bot, email workflow, custom UI) and responds with approved/denied.
The webhook receives:
{
"tool_name": "query",
"arguments": {"sql": "DELETE FROM users WHERE id = 5"},
"matched_rule": "approve-mutations",
"request_id": "req_abc123",
"timestamp": "2026-03-05T10:30:00Z"
}
And responds:
{"decision": "approved"}
Or:
{"decision": "denied", "reason": "Not during peak hours"}
When to use approval vs. deny
A common mistake is making everything require_approval. This defeats the purpose — if every action needs a human click, the agent becomes a very expensive CLI with an extra step.
Reserve approval for actions that:
- Have real consequences but aren't categorically dangerous (data mutations, sending messages, modifying config)
- Depend on context that the policy can't evaluate (time of day, business state, user intent)
- Are infrequent enough that human review is feasible (not 100 queries per minute)
Everything else should be a hard allow or hard deny. The goal is a small, meaningful set of approval checkpoints — not a firehose of prompts.
The audit trail
Every approval decision gets recorded in Kvlar's audit log — who approved what, when, and why (or why not). This creates a compliance-ready record of human oversight for sensitive agent operations.
As AI agents take on more autonomous work, the ability to prove that humans reviewed critical decisions becomes a regulatory requirement, not just a nice-to-have.
Getting started
Approval policies work with Kvlar v0.3.0. Start with a policy that denies dangerous operations and routes mutations to approval:
cargo install kvlar-cli
kvlar init --template postgres
kvlar wrap
Then customize the generated policy to match your team's risk tolerance. The documentation has the full policy reference.