Feature
A 5-layer detection pipeline that scans and redacts 50+ types of personally identifiable information across 13 countries and 3 industries — before it reaches your AI models. From regex patterns to cross-cultural name detection, Tork catches what others miss.
Every input passes through a 5-layer pipeline. Each layer adds precision — from fast regex to dictionary-backed name detection. Layers are composable: enable regional profiles and industry patterns without changing your code.
29 compiled patterns for universal PII: SSN, credit cards, emails, phones, IP addresses, and more. Sub-microsecond per pattern with quick-check signal gating.
Country-specific patterns activated by region parameter. Emirates ID, Aadhaar, TFN, NHS numbers, CPF — each with locale-aware formatting rules.
Cross-cultural name detection using 8 grammatical slots: Title, Given, Middle, Patronymic, Matronymic, Family, Generational, Honorific. Catches names regex can't.
Reduces false positives by analyzing surrounding context. '10 Main Street' is an address, '10 main reasons' is not. Signal words and proximity scoring.
13,000+ first names and 8,000+ last names from 40+ cultures. Weighted by frequency to minimize false positives on common words that are also names.
Activate country-specific and industry-specific detection patterns with a single parameter. Regions and industries are composable — combine them in one API call.
TFN, ABN, Medicare
SSN, EIN, ITIN
NHS, NI Number
VAT, IBAN
Emirates ID, +971
National ID, Iqama
NIN, BVN
Aadhaar, PAN
My Number
ID Card, Hukou
RRN
CPF, CNPJ
MRN, ICD-10 codes, DEA numbers, NPI, health plan IDs
SWIFT/BIC, routing numbers, CUSIP, ISIN, account numbers
Case numbers, bar IDs, court docket numbers, FEIN
curl -X POST https://tork.network/api/v1/govern \
-H "X-API-Key: tork_live_xxx" \
-d '{
"content": "Patient Aadhaar: 1234 5678 9012, ICD-10: J45.20",
"region": ["in", "ae"],
"industry": "healthcare"
}'Traditional regex-based detection fails for non-Western names. Patterns designed for "John Smith" miss names from African, Arabic, Indian, East Asian, and compound name cultures.
Tork's L1 Slot Detection uses 8 grammatical slots and a 13,000+ name dictionary spanning 40+ cultures. Context signals like "Patient:" boost confidence scoring.
Python — with regional profiles
from tork_governance import TorkGovernance
tork = TorkGovernance(api_key="tork_live_xxx")
# Basic scan — 29 universal PII patterns
result = tork.govern("Contact john@example.com or call 555-123-4567")
print(result.output)
# "Contact [EMAIL] or call [PHONE]"
# Regional scan — activate UAE + India patterns
result = tork.govern(
"Emirates ID: 784-1234-1234567-1, Aadhaar: 1234 5678 9012",
region=["ae", "in"]
)
# Regional + industry — healthcare PII patterns
result = tork.govern(
"Patient Aadhaar: 1234 5678 9012, ICD-10: J45.20",
region=["in"],
industry="healthcare"
)
# Receipt for audit trail
print(result.receipt.id) # "rcpt_a3f2c1..."
print(result.receipt.input_hash) # "sha256:..."
Designed for production workloads. Redis caching with 60s TTL means repeat scans resolve in single-digit milliseconds. Regional patterns add less than 3ms overhead.
john@email.com → [EMAIL]Replace with type indicator
john@email.com → j***@e***.comPartial masking for context
john@email.com → a3f2c1...Consistent hash for correlation
john@email.com → <REDACTED>Your own replacement text
Tork's PII detection helps you meet data minimization requirements across major privacy regulations. Combined with our audit logging and cryptographic receipts, you have the documentation needed for compliance audits.
Start detecting and redacting PII across 13 countries in minutes.