Injection Detection

Rivano scores every inbound request for prompt injection risk before forwarding it to an LLM provider. A score of 0.0 means no injection patterns detected; 1.0 means high-confidence injection attempt. Requests above the configured threshold are blocked with a 403 — no tokens are consumed.

How scoring works

The injection scorer analyzes the full messages array using a pattern library that covers:

  • Role confusion — Attempts to override the system prompt or claim a privileged identity ("Ignore previous instructions...", "You are now...")
  • Delimiter injection — Crafted sequences that exploit how prompts are assembled (</s>, [INST], ###, ---)
  • Data exfiltration patterns — Instructions to output secrets, credentials, or internal system state
  • Indirect injection — Payloads embedded in retrieved documents or tool call results that attempt to redirect agent behavior
  • Jailbreak patterns — Well-known jailbreak templates including DAN, AIM, and similar variants

Each pattern match contributes to the score. The final score is a weighted composite normalized to [0.0, 1.0].

Score interpretation

Score rangeRisk levelDefault behavior
0.0 – 0.3LowPass through
0.3 – 0.6MediumPass through (warn header added)
0.6 – 0.7ElevatedPass through (warn header added)
0.7 – 1.0HighBlocked (default threshold)

The score for every request is recorded in the trace. You can view score distribution in the Observability tab.

Threshold tuning

The default block threshold is 0.7. You can tune this per-tenant by creating a policy with an injection_score condition:

# rivano.yaml — raise the block threshold for a low-sensitivity agent
policies:
  - name: injection-block
    phase: request
    condition:
      type: injection_score
      threshold: 0.85
    action: block

# Lower threshold for a high-security agent
  - name: injection-block-strict
    phase: request
    condition:
      type: injection_score
      threshold: 0.5
    action: block
💡

Start with the default threshold and observe the score distribution for your traffic in Observability → Traces. Filter by injection_score > 0 to see scored requests. Adjust the threshold once you understand your baseline false-positive rate.

Policy integration

Create injection policies via the SDK:

import Rivano from '@rivano/sdk';

const rivano = new Rivano({ apiKey: process.env.RIVANO_API_KEY! });

// Block high-confidence injection attempts
await rivano.policies.create({
  name: 'injection-block',
  phase: 'request',
  condition: {
    type: 'injection_score',
    threshold: 0.7,
  },
  action: 'block',
  enabled: true,
});

// Warn (add header) on medium-confidence attempts without blocking
await rivano.policies.create({
  name: 'injection-warn',
  phase: 'request',
  condition: {
    type: 'injection_score',
    threshold: 0.4,
  },
  action: 'warn',
  enabled: true,
});

Warn action behavior

When a policy action is warn, Rivano adds the header X-Rivano-Warning: injection_score_elevated to the response. The request still reaches the LLM provider. Your application can inspect this header and log or alert accordingly.

Blocked request response

When a request is blocked by injection detection, Rivano returns:

HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Rivano-Policy: injection-block

{
  "error": "Request blocked by policy",
  "policy": "injection-block",
  "details": {
    "injection_score": 0.84,
    "threshold": 0.7
  }
}

The score and threshold are included in the response body so your application can surface a meaningful error to the user.