> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vlm.run/llms.txt
> Use this file to discover all available pages before exploring further.

# Document Redaction & Edit

> Automatically detect and redact or replace sensitive information in documents with enterprise-grade compliance.

VLM Run provides two complementary document privacy capabilities, both served via the `/document/generate` endpoint:

* **Redaction**: Detects and **blurs** sensitive information, making it unreadable while preserving document layout.
* **Edit (Replace)**: Detects sensitive information and **replaces** it with consistent dummy data (e.g. names become "John Smith", DOB becomes "10/06/1974"), producing a document that looks realistic but contains no real PII.

Each specialized domain follows industry-specific compliance standards, ensuring your documents are compliant while maintaining readability.

<Frame caption="Example of document redaction applied to a medical form">
  <div style={{ display: 'flex', gap: '20px', justifyContent: 'center' }}>
    <div style={{ flex: 1 }}>
      <h4 style={{ textAlign: 'center', marginBottom: '10px' }}>Original Document</h4>

      <img src="https://www.carepatron.com/files/physical-therapy-referral-form-sample-template.jpg" alt="Original document with visible sensitive data" style={{ width: '100%', border: '1px solid #ddd', borderRadius: '4px' }} />
    </div>

    <div style={{ flex: 1 }}>
      <h4 style={{ textAlign: 'center', marginBottom: '10px' }}>Redacted Document</h4>

      <img src="https://mintcdn.com/autonomiai/hv1ZFyEZ1wMYWx0b/guides/doc-ai/images/document-redaction.jpeg?fit=max&auto=format&n=hv1ZFyEZ1wMYWx0b&q=85&s=36882d0ef9ae483ced20e0355c19f608" alt="Redacted document with sensitive data obscured" style={{ width: '100%', border: '1px solid #ddd', borderRadius: '4px' }} width="583" height="828" data-path="guides/doc-ai/images/document-redaction.jpeg" />
    </div>
  </div>
</Frame>

## Quick Start

<Steps>
  <Step title="Upload Document">
    Upload your document containing sensitive information:

    ```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    from vlmrun.client import VLMRun
    from pathlib import Path

    client = VLMRun(api_key="<your-api-key>")
    file_response = client.files.upload(
        file=Path("path/to/your_document.pdf")
    )
    ```
  </Step>

  <Step title="Submit for Redaction or Edit">
    Choose the appropriate domain for your use case:

    ```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    response = client.document.generate(
        domain="healthcare.phi-redaction",  # Blur sensitive information
        file=file_response.id,
        batch=True
    )
    ```

    Or use the edit-replace variant to substitute PHI with dummy data:

    ```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    response = client.document.generate(
        domain="healthcare.phi-edit-replace",  # Replace with dummy data
        file=file_response.id,
        batch=True
    )
    ```
  </Step>

  <Step title="Get Processed Document">
    Wait for completion and access the result:

    ```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
    completed_response = client.predictions.wait(response.id, timeout=120)
    detected_items = completed_response.response["detected_items"]
    result_uri = completed_response.response["uri"]
    print(f"Detected items: {detected_items}")
    print(f"Processed document: {result_uri}")
    ```
  </Step>
</Steps>

## Available Domains

Choose the appropriate domain based on your document type, compliance requirements, and desired output:

### Redaction Domains (Blur)

These domains detect sensitive information and **blur** it in the output document:

| Use Case                | Domain                           | Compliance Standards      |
| ----------------------- | -------------------------------- | ------------------------- |
| **Healthcare PHI**      | `healthcare.phi-redaction`       | HIPAA Safe Harbor         |
| **Resume Redaction**    | `hr.resume-redaction`            | GDPR, CCPA, CPRA          |
| **Legal Documents**     | `legal.document-redaction`       | Attorney-Client Privilege |
| **Financial Data**      | `financial.document-redaction`   | PCI DSS, SOX, GLBA        |
| **FOIA Requests**       | `government.foia-redaction`      | FOIA Regulations          |
| **Insurance Documents** | `insurance.document-redaction`   | Insurance Regulations     |
| **Real Estate**         | `real-estate.document-redaction` | Real Estate Privacy       |
| **PII Redaction**       | `document.pii-redaction`         | CA Penal Code Section 741 |

### Edit Domains (Replace)

These domains detect sensitive information and **replace** it with consistent dummy data:

| Use Case                | Domain                        | Compliance Standards |
| ----------------------- | ----------------------------- | -------------------- |
| **Healthcare PHI Edit** | `healthcare.phi-edit-replace` | HIPAA Safe Harbor    |

## Key Use Cases

### Healthcare & Insurance

* **Medical Records**: Redact PHI for research and sharing
* **Insurance Claims**: Remove sensitive medical and personal information
* **Clinical Data**: Protect patient privacy in studies and trials

### Financial Services

* **Loan Applications**: Redact personal financial information
* **Account Statements**: Remove sensitive account details
* **Compliance Reports**: Prepare regulatory submissions
* **M\&A Documents**: Protect proprietary information during due diligence

### Legal & Government

* **Court Filings**: Prepare public documents with protected information
* **FOIA Requests**: Redact exempt information for public release
* **Discovery Materials**: Redact sensitive information during legal processes
* **Attorney Communications**: Protect privileged information

### HR & Recruitment

* **Resume Processing**: Enable blind hiring by removing bias-inducing information
* **Employee Records**: Protect personal and sensitive employee data
* **Background Checks**: Remove sensitive verification data

## Information Types Redacted

VLM Run automatically detects and redacts:

* **Personal Identifiers**: Names, SSNs, account numbers, driver's licenses
* **Contact Information**: Addresses, phone numbers, email addresses
* **Financial Data**: Account balances, salary information, credit scores
* **Medical Information**: PHI, medical record numbers, health conditions
* **Legal Information**: Case numbers, settlement amounts, privileged communications
* **Geographic Data**: Addresses, ZIP codes, neighborhood information

## Complete Examples

### Redaction (Blur)

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse, FileResponse
from pathlib import Path

client = VLMRun(api_key="<your-api-key>")

file_response: FileResponse = client.files.upload(
    file=Path("path/to/your_document.pdf")
)

response: PredictionResponse = client.document.generate(
    domain="healthcare.phi-redaction",
    file=file_response.id,
    batch=True
)

completed_response = client.predictions.wait(response.id, timeout=120)
detected_items = completed_response.response["detected_items"]
redacted_uri = completed_response.response["uri"]

print(f"Detected items: {detected_items}")
print(f"Redacted document: {redacted_uri}")
```

### Edit (Replace with Dummy Data)

```python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse, FileResponse
from pathlib import Path

client = VLMRun(api_key="<your-api-key>")

file_response: FileResponse = client.files.upload(
    file=Path("path/to/medical_record.pdf")
)

response: PredictionResponse = client.document.generate(
    domain="healthcare.phi-edit-replace",
    file=file_response.id,
    batch=True
)

completed_response = client.predictions.wait(response.id, timeout=120)
detected_items = completed_response.response["detected_items"]
edited_uri = completed_response.response["uri"]

print(f"Detected items: {detected_items}")
print(f"Edited document: {edited_uri}")
```

## Example Responses

### Redaction Response

```json theme={"theme":{"light":"github-light","dark":"dark-plus"}}
{
  "id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
  "status": "completed",
  "response": {
    "detected_items": [
      {
        "item_type": "name",
        "value": "John Doe"
      },
      {
        "item_type": "ssn",
        "value": "123-45-6789"
      },
      {
        "item_type": "telephone_number",
        "value": "(555) 123-4567"
      }
    ],
    "uri": "https://storage.googleapis.com/vlm-userdata/healthcare/phi-redaction/redacted-document.pdf"
  }
}
```

### Edit Response

The edit response has the same structure. The `detected_items` list contains the original PHI values that were found and replaced with dummy data in the output document:

```json theme={"theme":{"light":"github-light","dark":"dark-plus"}}
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "completed",
  "response": {
    "detected_items": [
      {
        "item_type": "name",
        "value": "Jane Doe"
      },
      {
        "item_type": "date_of_birth",
        "value": "03/15/1985"
      },
      {
        "item_type": "ssn",
        "value": "987-65-4321"
      }
    ],
    "uri": "https://storage.googleapis.com/vlm-userdata/healthcare/phi-edit-replace/edited-document.pdf"
  }
}
```

## Benefits

### Operational Efficiency

* **Automated Processing**: Reduce manual redaction time from hours to minutes
* **Batch Operations**: Process large document volumes efficiently
* **Error Reduction**: Eliminate human errors in manual redaction processes
* **Scalability**: Handle growing document volumes without additional staff

### Compliance & Security

* **Regulatory Compliance**: Meet industry-specific requirements (HIPAA, PCI DSS, SOX, GDPR, etc.)
* **Data Breach Prevention**: Irreversible redaction prevents data recovery
* **Audit Trail**: Comprehensive logging for compliance verification
* **Legal Protection**: Reduce liability from accidental data exposure

### Cost Savings

* **Reduced Manual Labor**: Automate time-consuming redaction tasks
* **Lower Error Costs**: Prevent expensive compliance violations
* **Improved Productivity**: Focus staff on high-value activities
* **Scalable Operations**: Handle volume increases without proportional cost increases

## Supported Documents

* **PDF Documents** - Reports, contracts, legal briefs, medical records
* **Scanned Images** - Faxed documents, handwritten forms, ID cards
* **Multi-page Documents** - Complete case files, comprehensive reports
* **Mixed Content** - Documents containing both text and images
* **Spreadsheets** - Financial models, budget documents, transaction records

## Security Features

* 🔒 **Encryption**: All documents encrypted in transit and at rest
* 🏛️ **Regulatory Compliance**: Meets industry-specific standards
* 🔑 **Access Controls**: Role-based access and authentication
* 📝 **Audit Logging**: Comprehensive audit trails for all activities
* ⏰ **Secure URLs**: Time-limited, secure access to redacted documents
* 🚫 **Irreversible Redaction**: Permanent data removal prevents recovery
* 🔄 **Consistent Replacement**: Edit mode uses the same dummy values throughout a document for consistency

## Real-World Examples

See VLM Run's document redaction in action across different industries:

<Frame caption="Industry-specific redaction examples">
  <div style={{ display: 'grid', gridTemplateColumns: 'repeat(2, 1fr)', gap: '20px' }}>
    <div>
      <h4 style={{ textAlign: 'center', marginBottom: '10px' }}>Healthcare PHI Redaction</h4>

      <img src="https://mintcdn.com/autonomiai/hv1ZFyEZ1wMYWx0b/guides/doc-ai/images/redacted_2021-22_cu_gold_ship_card_front_0_healthcare_document_redaction_and_image_inference_test.png?fit=max&auto=format&n=hv1ZFyEZ1wMYWx0b&q=85&s=bddcea1b5f82f103a1b62ea20c16efa7" alt="Healthcare insurance card with PHI redacted" style={{ width: '100%', border: '1px solid #ddd', borderRadius: '4px' }} width="750" height="427" data-path="guides/doc-ai/images/redacted_2021-22_cu_gold_ship_card_front_0_healthcare_document_redaction_and_image_inference_test.png" />
    </div>

    <div>
      <h4 style={{ textAlign: 'center', marginBottom: '10px' }}>Legal Document Redaction</h4>

      <img src="https://mintcdn.com/autonomiai/hv1ZFyEZ1wMYWx0b/guides/doc-ai/images/redacted_Alabama_legal_document_redaction_and_image_inference_test.png?fit=max&auto=format&n=hv1ZFyEZ1wMYWx0b&q=85&s=faa96a00a000eccc99b43dc3f8028bbb" alt="Legal document with sensitive information redacted" style={{ width: '100%', border: '1px solid #ddd', borderRadius: '4px' }} width="650" height="406" data-path="guides/doc-ai/images/redacted_Alabama_legal_document_redaction_and_image_inference_test.png" />
    </div>

    <div>
      <h4 style={{ textAlign: 'center', marginBottom: '10px' }}>Insurance Document Redaction</h4>

      <img src="https://mintcdn.com/autonomiai/hv1ZFyEZ1wMYWx0b/guides/doc-ai/images/redacted_invoice_1_insurance_document_redaction_and_image_inference_test.png?fit=max&auto=format&n=hv1ZFyEZ1wMYWx0b&q=85&s=2a7a77d943e7ef050ac60f79d9e6c4c0" alt="Insurance invoice with sensitive data redacted" style={{ width: '100%', border: '1px solid #ddd', borderRadius: '4px' }} width="817" height="1057" data-path="guides/doc-ai/images/redacted_invoice_1_insurance_document_redaction_and_image_inference_test.png" />
    </div>

    <div>
      <h4 style={{ textAlign: 'center', marginBottom: '10px' }}>Race Blind PII Redaction</h4>

      <img src="https://mintcdn.com/autonomiai/hv1ZFyEZ1wMYWx0b/guides/doc-ai/images/redacted_Sherrickas-walmart-receipt-7_race_blind_document_redaction_and_image_inference_test.png?fit=max&auto=format&n=hv1ZFyEZ1wMYWx0b&q=85&s=4df779f30b41e7dc097d2cf80e013dca" alt="Receipt with bias-inducing information redacted" style={{ width: '100%', border: '1px solid #ddd', borderRadius: '4px' }} width="1230" height="2560" data-path="guides/doc-ai/images/redacted_Sherrickas-walmart-receipt-7_race_blind_document_redaction_and_image_inference_test.png" />
    </div>
  </div>
</Frame>

## Related Capabilities

<CardGroup cols={2}>
  <Card title="Structured Responses" icon="box" href="/capabilities/structured-responses">
    Extract structured data from documents before redaction processing.
  </Card>

  <Card title="Visual Grounding" icon="object-group" href="/capabilities/visual-grounding">
    Locate sensitive information with precise coordinates for targeted redaction.
  </Card>

  <Card title="Custom Schemas" icon="layer-group" href="/capabilities/custom-schemas">
    Define custom redaction rules for specific compliance requirements.
  </Card>

  <Card title="Long Context Outputs" icon="file-text" href="/capabilities/long-context-outputs">
    Process large documents and complex multi-page reports.
  </Card>
</CardGroup>

## Try our Document -> JSON API today

Head over to our [Document -> JSON](/api-reference/v1/post-document-generate) to start building your own document processing pipeline with [VLM Run](https://vlm.run). Sign-up for access on our [platform](https://app.vlm.run).
