Sensitive Info Guardrail

Automatically detect and handle sensitive information in API requests

The Sensitive Info Guardrail lets you automatically detect and handle sensitive information — such as email addresses, phone numbers, credit card numbers, and names — before requests reach the model provider. You can choose to redact (replace with a placeholder) or block (reject the request entirely) when sensitive data is detected.

This feature is part of Guardrails and can be configured alongside budget limits, model restrictions, and other guardrail settings.

How It Works

When a Sensitive Info Guardrail is active, every API request is scanned before it is forwarded to the model provider:

  1. Detection — The request content is checked against your configured patterns and presets.
  2. Action — If a match is found, the configured action is applied:
    • Redact: The matched text is replaced with a labeled placeholder (e.g., [EMAIL], [PHONE], [REDACTED]) and the modified request is forwarded to the provider.
    • Block: The entire request is rejected with an HTTP 403 Forbidden error.
  3. Forwarding — If no sensitive info is detected (or all matches were redacted), the request proceeds to the model provider as normal.

Sensitive info detection runs on the input (prompt) side of requests. It scans message content, tool call arguments, and prompt strings. It does not scan model responses.

Detection Methods

OpenRouter uses two complementary detection methods:

Regex-Based Detection

Most built-in presets and all custom patterns use regular expression matching. This is fast, deterministic, and adds negligible latency to requests.

Regex-based presets include:

  • Email addresses
  • Phone numbers
  • Social Security numbers (SSNs)
  • Credit card numbers
  • IP addresses

NLP-Based Detection

Some types of sensitive information — like person names and physical addresses — cannot be reliably detected with simple patterns. For these, OpenRouter uses NLP-powered entity recognition (via Presidio), which analyzes text contextually.

NLP-based presets include:

  • Person names
  • Physical addresses / locations

NLP-based detection adds latency to requests proportional to the size of the input text. The “Person Name” and “Address” presets are marked with an Adds latency label in the dashboard to indicate this.

Built-In Presets

The following presets are available out of the box. Each can be individually enabled and configured with either the Redact or Block action.

PresetDetection MethodRedaction LabelExample Matches
Email addressRegex[EMAIL]user@example.com, name+tag@domain.co
Phone numberRegex[PHONE]914-309-4996, 914.309.4996, 9143094996
Social Security numberRegex[SSN]123-45-6789
Credit card numberRegex[CREDIT_CARD]4265 5256 0839 8752, 4265-5256-0839-8752
IP addressRegex[IP_ADDRESS]192.168.0.1, 10.0.0.1
Person nameNLP[PERSON_NAME]John Smith, Dr. Sarah Johnson, Maria Garcia-Lopez
AddressNLP[ADDRESS]123 Main Street, Springfield, London, United Kingdom

NLP Preset Limitations

NLP-based detection is contextual and probabilistic. Keep the following in mind:

Person Name:

  • May not catch names without surrounding context
  • Uncommon or non-Western names may be missed
  • Single-word names (e.g., “Cher”) are harder to detect

Address:

  • Partial addresses without city/state may be missed
  • Ambiguous location names (e.g., “Paris” as a name vs. a city) depend on context
  • Non-standard or abbreviated formats may not be detected

Custom Patterns

In addition to built-in presets, you can define your own custom regex patterns to detect domain-specific sensitive information. Each custom pattern requires:

When a custom pattern matches with the Redact action, the matched text is replaced with [REDACTED]. When set to Block, the entire request is rejected.

Example Custom Patterns

Use CasePatternAction
Internal project codesPROJ-\d{4,6}Redact
AWS access keysAKIA[0-9A-Z]{16}Block
Internal URLshttps?://internal\.company\.com\S*Redact

Pattern Safety

Patterns are validated for:

  1. Syntax — Must be a valid JavaScript regular expression.
  2. Safety — Must not be vulnerable to catastrophic backtracking (ReDoS). Patterns with nested quantifiers like (a+)+ or (a|a)* are rejected.

Invalid or unsafe patterns are rejected at creation time with a descriptive error message.

Configuring Sensitive Info Guardrails

Via the Dashboard

  1. Navigate to your workspace’s Privacy & Guardrails page, or go to Settings > Privacy.
  2. Create a new guardrail or edit an existing one.
  3. Expand the Sensitive Info section.
  4. Enable the desired built-in presets and/or add custom patterns.
  5. For each preset or pattern, choose the action: Redact or Block.
  6. Save the guardrail.

You can use the Enable all / Disable all buttons to quickly toggle all built-in presets.

Via the API

Sensitive info filters are configured as part of the guardrail object using the content_filter_builtins and content_filters fields.

Built-in presets use the content_filter_builtins field:

1{
2 "name": "PII Protection",
3 "content_filter_builtins": [
4 { "slug": "email", "action": "redact" },
5 { "slug": "phone", "action": "redact" },
6 { "slug": "ssn", "action": "block" },
7 { "slug": "credit-card", "action": "block" },
8 { "slug": "ip-address", "action": "redact" },
9 { "slug": "person-name", "action": "redact" },
10 { "slug": "address", "action": "redact" }
11 ]
12}

Available slugs: email, phone, ssn, credit-card, ip-address, person-name, address.

Custom patterns use the content_filters field:

1{
2 "name": "Custom Filters",
3 "content_filters": [
4 { "pattern": "AKIA[0-9A-Z]{16}", "action": "block", "label": "AWS Key" },
5 { "pattern": "PROJ-\\d{4,6}", "action": "redact" }
6 ]
7}

Each custom filter supports an optional label field for descriptive error messages when blocking.

See the Guardrails API reference for full endpoint documentation.

How Sensitive Info Interacts with Other Guardrails

Sensitive info filters follow the same guardrail hierarchy as other guardrail settings. When multiple guardrails apply to a request:

  • Content filters are unioned — If a member guardrail has an email filter and an API key guardrail has a phone filter, both filters apply.
  • Block wins over redact — If the same entity type appears in multiple guardrails with different actions, the stricter action (block) takes precedence.
  • Custom and built-in filters combine — Filters from all applicable guardrails (default, member, and API key level) are merged together.

Error Responses

When a request is blocked by a content filter, the API returns:

1{
2 "error": {
3 "code": 403,
4 "message": "Request blocked by content filter: [LABEL]"
5 }
6}

The [LABEL] in the error message depends on what triggered the block:

  • For built-in presets: the preset label (e.g., Email address, Social Security number)
  • For custom patterns with a label field: the custom label
  • For custom patterns without a label: [BLOCKED]
  • For NLP-detected entities: the entity type (e.g., Blocked PII detected: PERSON)

Best Practices

  • Start with Redact — Use Redact as the default action when getting started. This lets requests proceed while protecting sensitive data, giving you time to evaluate detection accuracy before switching to Block.

  • Use built-in presets for common PII — The built-in presets are tuned for common formats and are the easiest way to get started. Add custom patterns for domain-specific data.

  • Be aware of NLP latency — The Person Name and Address presets use NLP-based detection, which adds latency proportional to input size. If latency is critical, consider using only regex-based presets.

  • Test before deploying — Use the Test Preview in the guardrail editor to verify your filters work as expected before saving and assigning the guardrail.

  • Combine with other guardrail settings — Sensitive info filters work alongside budget limits, model allowlists, provider restrictions, and ZDR enforcement. Use them together for comprehensive governance.

  • Use labels on custom block patterns — Adding a label to custom patterns that use the Block action provides clearer error messages to API consumers, making it easier to understand why a request was rejected.