Tork

Feature

PII Detection & Redaction

A 5-layer detection pipeline that scans and redacts 50+ types of personally identifiable information across 13 countries and 3 industries — before it reaches your AI models. From regex patterns to cross-cultural name detection, Tork catches what others miss.

50+
PII Types
13
Countries
3
Industries
~6ms
Cache Latency
5
Detection Layers

Detection Pipeline

Every input passes through a 5-layer pipeline. Each layer adds precision — from fast regex to dictionary-backed name detection. Layers are composable: enable regional profiles and industry patterns without changing your code.

Input
L0 Regex
L0.5 Regional
L1 Slots
L1.5 Context
L3 Dictionary
Output
L0Regex Patterns

29 compiled patterns for universal PII: SSN, credit cards, emails, phones, IP addresses, and more. Sub-microsecond per pattern with quick-check signal gating.

L0.5Regional Patterns

Country-specific patterns activated by region parameter. Emirates ID, Aadhaar, TFN, NHS numbers, CPF — each with locale-aware formatting rules.

L1Slot Detection

Cross-cultural name detection using 8 grammatical slots: Title, Given, Middle, Patronymic, Matronymic, Family, Generational, Honorific. Catches names regex can't.

L1.5Context Analysis

Reduces false positives by analyzing surrounding context. '10 Main Street' is an address, '10 main reasons' is not. Signal words and proximity scoring.

L3Dictionary Lookup

13,000+ first names and 8,000+ last names from 40+ cultures. Weighted by frequency to minimize false positives on common words that are also names.

Regional PII Profiles

Activate country-specific and industry-specific detection patterns with a single parameter. Regions and industries are composable — combine them in one API call.

13 Supported Countries

🇦🇺AUAustralia

TFN, ABN, Medicare

🇺🇸USUnited States

SSN, EIN, ITIN

🇬🇧GBUnited Kingdom

NHS, NI Number

🇪🇺EUEuropean Union

VAT, IBAN

🇦🇪AEUAE

Emirates ID, +971

🇸🇦SASaudi Arabia

National ID, Iqama

🇳🇬NGNigeria

NIN, BVN

🇮🇳INIndia

Aadhaar, PAN

🇯🇵JPJapan

My Number

🇨🇳CNChina

ID Card, Hukou

🇰🇷KRSouth Korea

RRN

🇧🇷BRBrazil

CPF, CNPJ

3 Industry Profiles

HealthcareHIPAA

MRN, ICD-10 codes, DEA numbers, NPI, health plan IDs

FinancePCI-DSS

SWIFT/BIC, routing numbers, CUSIP, ISIN, account numbers

LegalAttorney-Client

Case numbers, bar IDs, court docket numbers, FEIN

Composable — combine regions + industries in one call
curl -X POST https://tork.network/api/v1/govern \
  -H "X-API-Key: tork_live_xxx" \
  -d '{
    "content": "Patient Aadhaar: 1234 5678 9012, ICD-10: J45.20",
    "region": ["in", "ae"],
    "industry": "healthcare"
  }'

Cross-Cultural Name Detection

Problem

Traditional regex-based detection fails for non-Western names. Patterns designed for "John Smith" miss names from African, Arabic, Indian, East Asian, and compound name cultures.

Input: "Patient: Chukwuemeka Okonkwo"
Regex result: No PII detected ✗
Solution — Slot Detection

Tork's L1 Slot Detection uses 8 grammatical slots and a 13,000+ name dictionary spanning 40+ cultures. Context signals like "Patient:" boost confidence scoring.

Input: "Patient: Chukwuemeka Okonkwo"
Tork result: "Patient: [PERSON_NAME]" ✓
Title
Dr, Prof, Sheikh, Pandit
Given
Chukwuemeka, Priya, Yuki
Middle
Ibrahim, Raj, Marie
Patronymic
bin, ibn, -ovich, Mac
Matronymic
binti, -ovna, Ní
Family
Okonkwo, Nakamura, Al-
Generational
Jr, III, fils, 世
Honorific
San, Ji, Sahib, Puan

Try It Live

tork-pii-demo
InputEdit to try your own text
Tork Output
Click 'Redact PII' to see the magic...

50+ PII Types Detected

Contact Info

  • Email addresses
  • Phone numbers (global)
  • Physical addresses
  • IP addresses

Identity

  • Full names
  • Social Security Numbers
  • Passport numbers
  • Driver's license

Financial

  • Credit card numbers
  • Bank accounts
  • Tax IDs
  • Financial account numbers

Healthcare

  • Medical record numbers
  • Health plan IDs
  • Prescription info
  • Patient identifiers

Digital

  • API keys
  • Passwords
  • OAuth tokens
  • Private keys

Regional

  • AU TFN/ABN
  • UK NI numbers
  • EU VAT numbers
  • Canadian SIN

Simple Integration

Python — with regional profiles

from tork_governance import TorkGovernance

tork = TorkGovernance(api_key="tork_live_xxx")

# Basic scan — 29 universal PII patterns
result = tork.govern("Contact john@example.com or call 555-123-4567")
print(result.output)
# "Contact [EMAIL] or call [PHONE]"

# Regional scan — activate UAE + India patterns
result = tork.govern(
    "Emirates ID: 784-1234-1234567-1, Aadhaar: 1234 5678 9012",
    region=["ae", "in"]
)

# Regional + industry — healthcare PII patterns
result = tork.govern(
    "Patient Aadhaar: 1234 5678 9012, ICD-10: J45.20",
    region=["in"],
    industry="healthcare"
)

# Receipt for audit trail
print(result.receipt.id)  # "rcpt_a3f2c1..."
print(result.receipt.input_hash)  # "sha256:..."

Performance

Designed for production workloads. Redis caching with 60s TTL means repeat scans resolve in single-digit milliseconds. Regional patterns add less than 3ms overhead.

~6ms
Cache hit latency
Redis with 60s TTL
~700ms
Cold scan latency
Full 5-layer pipeline
717
Requests/sec
Production throughput
LayerLatencyNotes
L0 Regex<1ms29 compiled patterns, signal gating
L0.5 Regional+1–3msPer activated region
L1 Slot Detection+2–5ms8 grammatical slots
L1.5 Context+1msSignal word proximity
L3 Dictionary+2–4ms21K name lookups
Redis cache hit~6ms totalSkips all layers

Flexible Redaction Options

Token Replacement

john@email.com → [EMAIL]

Replace with type indicator

Masked

john@email.com → j***@e***.com

Partial masking for context

Hashed

john@email.com → a3f2c1...

Consistent hash for correlation

Custom

john@email.com → <REDACTED>

Your own replacement text

Compliance Ready

GDPR
CCPA
HIPAA
SOC 2
PCI DSS
FERPA

Tork's PII detection helps you meet data minimization requirements across major privacy regulations. Combined with our audit logging and cryptographic receipts, you have the documentation needed for compliance audits.

Protect User Privacy Today

Start detecting and redacting PII across 13 countries in minutes.