Core Features
Jailbreak Detection
Detect and block prompt injection and jailbreak attempts before they reach your AI models.
Overview
Jailbreak detection uses multiple classifiers to identify attempts to manipulate AI behavior, bypass safety guidelines, or extract sensitive information through adversarial prompts.
Prompt Injection
Attempts to override system instructions
Role Playing Attacks
"Pretend you are..." manipulation
Instruction Extraction
Attempts to reveal system prompts
Encoding Bypass
Base64, ROT13, or Unicode obfuscation
Detecting Jailbreaks
Scan user input before sending to your AI model:
Detection Modes
Configure detection sensitivity based on your use case:
| Mode | Sensitivity | Use Case |
|---|---|---|
| strict | High | Financial, healthcare, high-security applications |
| balanced | Medium | General enterprise applications |
| permissive | Low | Creative applications, less sensitive contexts |
Integration with Policy Engine
Automatically block jailbreaks using policy rules:
Security Note: Always enable jailbreak detection for user-facing AI applications to prevent prompt injection attacks.