March 5, 2026

Human-in-the-Loop Security for AI Agents

By Kvlar Team

AI agent security has a spectrum problem. On one end: let the agent do anything. On the other: block everything that isn't explicitly allowed. Most real deployments need something in between.

That's where human-in-the-loop approval comes in.

The three-decision model

Kvlar's policy engine produces one of three decisions for every tool call:

Allow — the action is safe, forward it immediately
Deny — the action is dangerous, block it unconditionally
Require approval — the action might be fine, but a human should decide

The third decision is the interesting one. It acknowledges that some actions are context-dependent. A DELETE FROM users WHERE id = 5 might be perfectly reasonable during a debugging session but catastrophic in production. The policy can't always know — but a human reviewer can.

Designing approval rules

Good approval rules target the ambiguous middle ground. Here's a practical pattern for a Postgres MCP server:

rules:
  # Unambiguously dangerous — always block
  - id: deny-destructive
    match_on:
      resources: ["query"]
      parameters:
        sql: "(?i)\\b(DROP|TRUNCATE|ALTER)\\b"
    effect:
      type: deny
      reason: "Destructive DDL is never allowed"

  # Unambiguously safe — always allow
  - id: allow-reads
    match_on:
      resources: ["query"]
      parameters:
        sql: "(?i)^\\s*SELECT\\b"
    effect:
      type: allow

  # Context-dependent — route to a human
  - id: approve-mutations
    match_on:
      resources: ["query"]
      parameters:
        sql: "(?i)\\b(INSERT|UPDATE|DELETE)\\b"
    effect:
      type: require_approval
      reason: "Data mutations require human review"

The key insight: deny what you can, allow what you must, and route everything else to a human. This maximizes safety without making the agent useless.

Webhook integration

Kvlar v0.3.0 ships webhook-based approval routing. When a require_approval decision fires, the proxy can POST the approval request to your endpoint. Your system reviews it (Slack bot, email workflow, custom UI) and responds with approved/denied.

The webhook receives:

{
  "tool_name": "query",
  "arguments": {"sql": "DELETE FROM users WHERE id = 5"},
  "matched_rule": "approve-mutations",
  "request_id": "req_abc123",
  "timestamp": "2026-03-05T10:30:00Z"
}

And responds:

{"decision": "approved"}

Or:

{"decision": "denied", "reason": "Not during peak hours"}

When to use approval vs. deny

A common mistake is making everything require_approval. This defeats the purpose — if every action needs a human click, the agent becomes a very expensive CLI with an extra step.

Reserve approval for actions that:

Have real consequences but aren't categorically dangerous (data mutations, sending messages, modifying config)
Depend on context that the policy can't evaluate (time of day, business state, user intent)
Are infrequent enough that human review is feasible (not 100 queries per minute)

Everything else should be a hard allow or hard deny. The goal is a small, meaningful set of approval checkpoints — not a firehose of prompts.

The audit trail

Every approval decision gets recorded in Kvlar's audit log — who approved what, when, and why (or why not). This creates a compliance-ready record of human oversight for sensitive agent operations.

As AI agents take on more autonomous work, the ability to prove that humans reviewed critical decisions becomes a regulatory requirement, not just a nice-to-have.

Getting started

Approval policies work with Kvlar v0.3.0. Start with a policy that denies dangerous operations and routes mutations to approval:

cargo install kvlar-cli
kvlar init --template postgres
kvlar wrap

Then customize the generated policy to match your team's risk tolerance. The documentation has the full policy reference.