Gateway Middleware

Every request that passes through the gateway runs through a fixed middleware pipeline. Understanding the pipeline order helps you reason about policy evaluation timing, performance overhead, and how to tune each stage.

Pipeline order

Incoming request


1. Rate limiter        ← rejects if over rateLimit req/min


2. PII detector        ← scans request content for PII


3. Injection scorer    ← scores prompt injection risk (0–1)


4. Policy evaluator    ← evaluates request-phase policies
      │                   (block returns 400, redact modifies content)

5. Provider proxy      ← forwards to LLM provider


6. Response PII        ← scans response content for PII


7. Policy evaluator    ← evaluates response-phase policies


8. Audit logger        ← writes trace record


Response to your app

Rate limiter

The rate limiter is always active. It tracks requests per IP address per minute:

gateway:
  middleware:
    rateLimit: 60  # 60 requests/min per IP (default)

Requests over the limit receive a 429 Too Many Requests response. Set rateLimit: 0 to disable rate limiting (not recommended in production).

PII detection

PII detection scans the request body for common personally identifiable information patterns:

gateway:
  middleware:
    pii: true  # default

When PII is detected, the gateway:

  1. Tags the trace with pii_detected: true
  2. Evaluates any pii_detected policies you have configured (block, redact, or warn)

PII types detected: email addresses, phone numbers, credit card numbers, social security numbers, IP addresses, and passport numbers.

💡

PII detection alone does not block or redact anything. You need a policy with conditionType: pii_detected and an appropriate action to take effect. Apply the foundational template pack for sensible defaults.

Injection scoring

The injection scorer assigns a risk score between 0 and 1 to each request, where 1 is highest risk:

gateway:
  middleware:
    injection: true  # default

The score is computed using a pattern-matching classifier trained on common prompt injection techniques (role-play overrides, system prompt leaks, instruction injection).

The score is available to policies via conditionType: injection_score. The foundational policy pack blocks requests scoring ≥ 0.7.

Policy evaluator

Policies are evaluated in two phases:

Request phase — runs before the LLM call. A block action returns a 400 Bad Request to your application immediately; the LLM is never called. A redact action modifies the request content before forwarding.

Response phase — runs after the LLM responds. A redact action modifies the response content before it reaches your application.

Policy evaluation order within a phase follows creation order. A block action short-circuits subsequent policy checks.

Provider proxy

The proxy stage forwards the (possibly modified) request to the target provider and waits for a response. The provider is selected based on the routing rules described in Gateway Providers.

The timeoutMs field on each provider config controls how long the proxy waits before returning a 504 Gateway Timeout.

Audit logger

The audit logger writes a JSON record for every completed request:

{
  "traceId": "trace_xyz789",
  "timestamp": "2026-04-04T14:22:00.312Z",
  "agentName": "contract-summarizer",
  "provider": "openai",
  "model": "gpt-4o",
  "status": 200,
  "durationMs": 312,
  "inputTokens": 142,
  "outputTokens": 87,
  "costUsd": 0.0041,
  "piiDetected": false,
  "injectionScore": 0.02,
  "policiesTriggered": []
}

Configure the output destination with the audit setting:

gateway:
  middleware:
    audit: stdout   # stdout | file | off

Disabling middleware

You can disable individual middleware stages:

gateway:
  middleware:
    pii: false        # No PII scanning
    injection: false  # No injection scoring
    audit: off        # No audit log
    cache: false      # No response caching

Disabling injection scoring and PII detection removes automatic data for policy evaluation. Policies that rely on injection_score or pii_detected conditions will never fire if the corresponding middleware is off.

Response caching

When cache: true, the gateway caches LLM responses for identical requests:

gateway:
  middleware:
    cache: true

Cache hits return the cached response immediately, skipping the provider proxy stage. This reduces latency and cost for repeated identical prompts. Cache keys are based on the full request body (model, messages, parameters).