VLM Run’s document redaction automatically detects and redacts sensitive information from documents across industries. Each specialized agent follows industry-specific compliance standards, ensuring your documents are compliant while maintaining readability.

Original Document

Original document with visible sensitive data

Redacted Document

Redacted document with sensitive data obscured

Example of document redaction applied to a medical form

Quick Start

1

Upload Document

Upload your document containing sensitive information:
from vlmrun.client import VLMRun
from pathlib import Path

client = VLMRun(api_key="<your-api-key>")
file_response = client.files.upload(
    file=Path("path/to/your_document.pdf")
)
2

Submit for Redaction

Use the appropriate specialized agent:
response = client.document.execute(
    name="healthcare/phi-redaction",  # Change based on your use case
    version="latest",
    file_ids=[file_response.id],
    batch=True
)
3

Get Redacted Document

Wait for completion and access the redacted document:
completed_response = client.predictions.wait(response.id, timeout=120)
redacted_uri = completed_response.response["redacted_uri"]
print(f"Redacted document: {redacted_uri}")

Industry-Specific Agents

Choose the appropriate agent based on your document type and compliance requirements:
Use CaseAgent NameCompliance Standards
Healthcare PHIhealthcare/phi-redactionHIPAA Safe Harbor
Resume Redactionhr/resume-redactionGDPR, CCPA, CPRA
Legal Documentslegal/document-redactionAttorney-Client Privilege
Financial Datafinancial/document-redactionPCI DSS, SOX, GLBA
FOIA Requestsgovernment/foia-redactionFOIA Regulations
Insurance Documentsinsurance/document-redactionInsurance Regulations
Real Estatereal-estate/document-redactionReal Estate Privacy
Race Blind Chargingdocument/pii-redactionCA Penal Code Section 741

Key Use Cases

Healthcare & Insurance

  • Medical Records: Redact PHI for research and sharing
  • Insurance Claims: Remove sensitive medical and personal information
  • Clinical Data: Protect patient privacy in studies and trials

Financial Services

  • Loan Applications: Redact personal financial information
  • Account Statements: Remove sensitive account details
  • Compliance Reports: Prepare regulatory submissions
  • M&A Documents: Protect proprietary information during due diligence
  • Court Filings: Prepare public documents with protected information
  • FOIA Requests: Redact exempt information for public release
  • Discovery Materials: Redact sensitive information during legal processes
  • Attorney Communications: Protect privileged information

HR & Recruitment

  • Resume Processing: Enable blind hiring by removing bias-inducing information
  • Employee Records: Protect personal and sensitive employee data
  • Background Checks: Remove sensitive verification data

Information Types Redacted

VLM Run automatically detects and redacts:
  • Personal Identifiers: Names, SSNs, account numbers, driver’s licenses
  • Contact Information: Addresses, phone numbers, email addresses
  • Financial Data: Account balances, salary information, credit scores
  • Medical Information: PHI, medical record numbers, health conditions
  • Legal Information: Case numbers, settlement amounts, privileged communications
  • Geographic Data: Addresses, ZIP codes, neighborhood information

Complete Example

from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse, FileResponse
from pathlib import Path

# Initialize the client
client = VLMRun(api_key="<your-api-key>")

# Upload your document
file_response: FileResponse = client.files.upload(
    file=Path("path/to/your_document.pdf")
)

# Submit for redaction (choose appropriate agent)
response: PredictionResponse = client.document.execute(
    name="healthcare/phi-redaction",  # Change based on your use case
    version="latest",
    file_ids=[file_response.id],
    batch=True
)

# Wait for completion
completed_response = client.predictions.wait(response.id, timeout=120)

# Access results
redacted_uri = completed_response.response["redacted_uri"]
redacted_items = completed_response.response["redacted_items"]

print(f"Redacted document: {redacted_uri}")
print(f"Redacted items: {redacted_items}")

Example Response

{
  "id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
  "status": "completed",
  "response": {
    "redacted_uri": "https://storage.googleapis.com/vlm-userdata/agents/healthcare/phi-redaction/redacted-document.pdf",
    "redacted_items": [
      {
        "phi_type": "name",
        "confidence": "high",
        "count": 3
      },
      {
        "phi_type": "date_elements",
        "confidence": "high",
        "count": 8
      },
      {
        "phi_type": "telephone_number",
        "confidence": "high",
        "count": 2
      }
    ],
    "compliance_standards": ["HIPAA", "Safe_Harbor"],
    "processing_time": "38.7 seconds"
  }
}

Benefits

Operational Efficiency

  • Automated Processing: Reduce manual redaction time from hours to minutes
  • Batch Operations: Process large document volumes efficiently
  • Error Reduction: Eliminate human errors in manual redaction processes
  • Scalability: Handle growing document volumes without additional staff

Compliance & Security

  • Regulatory Compliance: Meet industry-specific requirements (HIPAA, PCI DSS, SOX, GDPR, etc.)
  • Data Breach Prevention: Irreversible redaction prevents data recovery
  • Audit Trail: Comprehensive logging for compliance verification
  • Legal Protection: Reduce liability from accidental data exposure

Cost Savings

  • Reduced Manual Labor: Automate time-consuming redaction tasks
  • Lower Error Costs: Prevent expensive compliance violations
  • Improved Productivity: Focus staff on high-value activities
  • Scalable Operations: Handle volume increases without proportional cost increases

Supported Documents

  • PDF Documents - Reports, contracts, legal briefs, medical records
  • Scanned Images - Faxed documents, handwritten forms, ID cards
  • Multi-page Documents - Complete case files, comprehensive reports
  • Mixed Content - Documents containing both text and images
  • Spreadsheets - Financial models, budget documents, transaction records

Security Features

  • 🔒 Encryption: All documents encrypted in transit and at rest
  • 🏛️ Regulatory Compliance: Meets industry-specific standards
  • 🔑 Access Controls: Role-based access and authentication
  • 📝 Audit Logging: Comprehensive audit trails for all activities
  • Secure URLs: Time-limited, secure access to redacted documents
  • 🚫 Irreversible Redaction: Permanent data removal prevents recovery

Real-World Examples

See VLM Run’s document redaction in action across different industries:

Healthcare PHI Redaction

Healthcare insurance card with PHI redacted

Legal Document Redaction

Legal document with sensitive information redacted

Insurance Document Redaction

Insurance invoice with sensitive data redacted

Race Blind PII Redaction

Receipt with bias-inducing information redacted

Industry-specific redaction examples

Try our Document -> JSON API today

Head over to our Document -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.