Tork is AI agent governance middleware that sits between AI agents and the tools they call. It scans every interaction for PII, enforces configurable policies, and generates cryptographic compliance receipts.

Does Tork work with MCP?

Yes. Tork provides 48 MCP governance tools and a published MCP server that works with Claude, Cursor, Windsurf, and any MCP-compatible client.

What AI frameworks does Tork support?

Tork supports 116+ framework adapters including LangChain, CrewAI, AutoGen, OpenAI Agents, LlamaIndex, Semantic Kernel, Haystack, and more across 11 programming languages.

How much latency does Tork add?

Tork adds approximately 20ms per governance check via the cloud API. PII detection runs locally at sub-millisecond speeds via the Gravity Field engine.

How does Tork detect PII?

Tork detects 50+ PII types including emails, phone numbers, SSNs, credit cards, Medicare numbers, and regional formats across 13 countries. Detection uses a hybrid regex + NER approach running locally with sub-millisecond latency. Detected PII is redacted before reaching AI models.

Yes. Tork's core SDK is MIT licensed and available on GitHub. The open source core includes policy enforcement, PII detection, and local audit logs. Tork Cloud adds centralised dashboards, team management, SSO, and enterprise features.

How do I install Tork?

Install via your language's package manager: pip install tork (Python), npm install @tork-network/tork-js (JavaScript), go get github.com/torkjacobs/tork-go-sdk (Go), gem install tork-ruby (Ruby), cargo add tork-rust (Rust). Add 3 lines of code to start governing your AI agents.

Back to Blog

Deep Dive

PII Detection for LLMs: A Technical Guide

Name: Tork
Author: Tork Network Pty Ltd

February 20, 2026 · 9 min read · Yusuf Jacobs

Every LLM-powered agent is one prompt away from leaking PII. User messages contain names, emails, phone numbers, and social security numbers. Model responses can hallucinate real PII from training data. Here's how to build a detection pipeline that catches it all.

The PII Problem in LLM Pipelines

PII leakage in LLM applications happens at three points:

Input — Users submit personal information in their prompts. A support chat might include “My SSN is 123-45-6789, can you check my account?”

Context — RAG systems retrieve documents containing PII from vector databases. An employee handbook might include salary data.

Output — The model includes PII in its response, either echoed from input, retrieved from context, or hallucinated from training data.

A comprehensive PII detection system must scan all three points. Scanning only outputs misses PII that gets stored in conversation logs. Scanning only inputs misses model hallucination.

Approaches to PII Detection

There are three main approaches, each with trade-offs:

1. Regex-Based Detection

Pattern matching using regular expressions. Fast, deterministic, and zero-dependency.

// US Social Security Number
/\b\d{3}-\d{2}-\d{4}\b/

// Australian Medicare Number
/\b\d{4}[\s-]?\d{5}[\s-]?\d{1}\b/

// UK NHS Number
/\b\d{3}[\s-]?\d{3}[\s-]?\d{4}\b/

// Credit Card (Luhn-validated)
/\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/

Pros: Sub-millisecond latency, no external dependencies, works offline, deterministic.
Cons: Can't detect contextual PII (names, addresses), struggles with varied formats.

2. NER-Based Detection

Named Entity Recognition models (spaCy, Presidio, GLiNER) identify PII through machine learning:

# Using Microsoft Presidio
from presidio_analyzer import AnalyzerEngine

analyzer = AnalyzerEngine()
results = analyzer.analyze(text=user_input, language='en')
# Returns: [type=PERSON, start=11, end=21, score=0.85]

Pros: Catches names, addresses, and contextual PII. Higher recall for free-form text.
Cons: Slower (10-50ms per scan), requires model download, non-deterministic, can miss structured formats.

3. Hybrid Detection (What Tork Uses)

Tork combines both approaches. Regex patterns run first for structured PII (SSNs, credit cards, phone numbers), then a lightweight NER pass catches contextual PII (names, addresses). The result is high precision and high recall:

const result = await tork.pii.scan(text, {
  mode: 'hybrid', // regex + NER
  confidence: 0.8, // minimum confidence threshold
  regions: ['US', 'AU', 'EU', 'UK'],
});

The 50+ PII Types Tork Detects

Tork's detection engine covers PII across 13 countries. Here's the breakdown by category:

Identity

SSN, Passport, Driver's License, National ID, Aadhaar, MyNumber

Financial

Credit Card, Bank Account, IBAN, SWIFT/BIC, Tax File Number, EIN

Healthcare

Medicare (AU), NHS (UK), Health Insurance ID, Medical Record #

Contact

Email, Phone (15 formats), Address, Postcode, IP Address

Personal

Full Name, Date of Birth, Age, Gender, Ethnicity, Religion

Digital

API Key, JWT Token, AWS Key, Password Hash, SSH Key, OAuth Token

Redaction Strategies

Detecting PII is half the battle. The other half is what you do with it. Tork supports four redaction modes:

// 1. Placeholder redaction (default)
"My SSN is [SSN]"

// 2. Masked redaction
"My SSN is ***-**-6789"

// 3. Synthetic replacement
"My SSN is 555-00-1234" // Valid format, fake data

// 4. Hash-based pseudonymisation
"My SSN is PII_abc123def456" // Reversible with key

Synthetic replacement is particularly useful for testing and development — your agents behave identically because the data format is preserved, but no real PII exists in your dev environment.

Performance Considerations

PII detection adds latency to your agent pipeline. Here's what to expect with Tork's SDK:

Mode	Latency (1KB)	Latency (10KB)	Coverage
Regex only	< 1ms	2-5ms	Structured PII
NER only	8-15ms	25-50ms	Contextual PII
Hybrid	8-16ms	27-55ms	All PII types
Regex (Rust SDK)	< 0.1ms	< 1ms	Structured PII

For comparison, a typical LLM API call takes 500-3000ms. PII detection adds negligible overhead relative to model inference time.

Implementation Architecture

The recommended architecture scans at both the input and output boundaries of your LLM pipeline:

User Input
   ↓
[Tork PII Scan — Input] → Redact before sending to LLM
   ↓
LLM / Agent Processing
   ↓
[Tork PII Scan — Output] → Redact before returning to user
   ↓
Clean Response

For RAG pipelines, add a third scan point at document retrieval:

// Scan retrieved documents before injecting into context
const docs = await vectorStore.similaritySearch(query);

const cleanDocs = await Promise.all(
  docs.map(doc => tork.pii.redact(doc.pageContent))
);

// Pass cleaned documents to LLM
const response = await llm.chat(cleanDocs);

Common Pitfalls

We've seen teams make these mistakes when implementing PII detection:

Only scanning outputs — If PII enters your system in a user prompt, it's already in your conversation logs and potentially your vector database. Scan inputs too.

Ignoring international formats — A US-only regex won't catch Australian Medicare numbers or UK NHS numbers. If you have international users, detect international PII.

Scanning in the wrong layer — Don't scan in the frontend (JavaScript). Users can bypass client-side checks. Always scan server-side before logging or storage.

Not handling false positives — The number 123-45-6789 looks like an SSN but might be a product code. Use confidence thresholds and contextual validation.

Compliance Requirements

Different regulations have different requirements for PII handling:

GDPR (EU) — Requires data minimisation, right to erasure, and data processing records. Tork's redaction + receipts satisfy all three.

HIPAA (US) — PHI must be de-identified using Safe Harbor or Expert Determination methods. Tork's 18 HIPAA identifier types cover Safe Harbor requirements.

Privacy Act (AU) — Australian Privacy Principles require reasonable steps to protect personal information. Tork detects TFN, Medicare, and other AU-specific PII.

SOC 2 — Requires evidence of data protection controls. Tork's cryptographic receipts provide tamper-evident audit trails.

Try It Yourself

Test PII detection in your browser with our interactive demo — paste any text and see detections in real time. Or get started with the SDK:

npm install tork-governance
pip install tork-governance
go get github.com/torkjacobs/tork-go-sdk
cargo add tork-governance

Free tier includes 10,000 scans/month. Sign up here — no credit card required.

Tork Network Pty Ltd — Sydney, Australia