> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vlm.run/llms.txt
> Use this file to discover all available pages before exploring further.

# Pricing

> Flexible pricing plans for developers and enterprises to build with VLM Run.

## Simple, transparent pricing

Choose the plan that works best for your project. All plans include access to our core Vision Language Model capabilities with structured JSON outputs.

## Credit-based pricing

All applications use a credits-based system with three detail levels and optional grounding. The current conversion rate is:
<p align="center"><strong>100 credits = \$1</strong></p>

`document.*`, `image.*`, and `healthcare.*` domains charge **per page**, while `audio.*` and `video.*` domains charge **per duration-based segment**:

### Domain Based Pricing

The pricing below applies to all document.generate and image.generate API calls.

<table align="center">
  <thead>
    <tr>
      <th rowspan="2" align="left">Domain</th>
      <th colspan="3">Credits per Page / Image / 5-min Segment</th>
      <th rowspan="2">Grounding<br />(Add-on)</th>
    </tr>

    <tr>
      <th><code>lo</code></th>
      <th><code>auto</code></th>
      <th><code>hi</code></th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td><code>document.\*</code></td>
      <td align="center">1</td>
      <td align="center">2</td>
      <td align="center">4</td>
      <td align="center">+2</td>
    </tr>

    <tr>
      <td><code>document.markdown</code></td>
      <td align="center">1</td>
      <td align="center">4</td>
      <td align="center">6</td>
      <td align="center">-</td>
    </tr>

    <tr>
      <td><code>image.\*</code></td>
      <td align="center">1</td>
      <td align="center">2</td>
      <td align="center">4</td>
      <td align="center">+2</td>
    </tr>

    <tr>
      <td><code>healthcare.\*</code></td>
      <td align="center">1</td>
      <td align="center">2</td>
      <td align="center">4</td>
      <td align="center">+2</td>
    </tr>

    <tr>
      <td><code>audio.\*</code></td>
      <td align="center">1</td>
      <td align="center">1</td>
      <td align="center">2</td>
      <td align="center">-</td>
    </tr>

    <tr>
      <td><code>video.\*</code></td>
      <td align="center">10</td>
      <td align="center">10</td>
      <td align="center">20</td>
      <td align="center">-</td>
    </tr>
  </tbody>
</table>

* **Grounding add-on**: `document.*`, `image.*`, and `healthcare.*` domains support the *Grounding* add-on, which provides visual bounding boxes and confidence scores for detected entities. This add-on costs an additional **2 credits per page or image**.

* **Document Markdown**: `document.markdown` is optimized for markdown content and does not support the *Grounding* add-on.

* **Audio / Video Transcription**: `audio.*` and `video.*` domains are charged per 5-minute segment. For example, a 12-minute audio file will be billed for 15-minutes, or 3 segments.

* **Processing levels**: The `lo`, `auto`, and `hi` columns represent increasing levels of processing quality and computational cost.

* For all domains, a minimum of 2 segments is charged for audio/video files shorter than 10 minutes.

* If you need help estimating credits for your use case, please [contact us](mailto:support@vlm.run).

### Service Tiers

Every prediction, agent execution, and chat completion can be routed through one of three delivery tiers by setting `service_tier` (Python SDK) / `serviceTier` (Node.js SDK) on the request's `GenerationConfig`. The tier governs **both** how the request is routed (latency / availability) **and** how credits are billed.

<table align="center">
  <thead>
    <tr>
      <th align="left">Tier</th>
      <th align="center">Multiplier</th>
      <th>When to use</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td><code>standard</code> <em>(default)</em></td>
      <td align="center">1.0×</td>
      <td>Baseline rates and latency. Used when <code>service\_tier</code> is omitted, <code>null</code>, <code>"auto"</code>, or <code>"default"</code>.</td>
    </tr>

    <tr>
      <td><code>flex</code></td>
      <td align="center">0.5× <strong>(50% off)</strong></td>
      <td>Batch / background workloads that can tolerate higher and more variable latency. Great for nightly document processing, large video backfills, or bulk evaluation runs.</td>
    </tr>

    <tr>
      <td><code>priority</code></td>
      <td align="center">1.8×</td>
      <td>User-facing or latency-sensitive workloads that require the highest reliability and lowest queue times.</td>
    </tr>
  </tbody>
</table>

The tier multiplier is applied **uniformly** on top of the domain / agent credit cost — including LLM, media, and tool credits. For example, an `image.*` `hi` call with grounding normally costs `4 + 2 = 6` credits per image; at `flex` it costs `3` credits, and at `priority` it costs `10.8` credits.

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  from vlmrun.client import VLMRun
  from vlmrun.client.types import GenerationConfig

  client = VLMRun()

  # Flex tier — 50% discount, higher latency
  response = client.document.generate(
      file="invoice.pdf",
      domain="document.invoice",
      config=GenerationConfig(service_tier="flex"),
  )
  ```

  ```javascript Node.js theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  // Priority tier — 1.8x premium, lowest queue times
  await client.document.generate({
    fileId: "file_...",
    domain: "document.invoice",
    config: { serviceTier: "priority" },
  });
  ```

  ```bash cURL theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  # Flex tier — 50% discount, higher latency
  curl -X POST https://api.vlm.run/v1/document/generate \
    -H "Authorization: Bearer $VLMRUN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "file_id": "<file-id>",
      "domain": "document.invoice",
      "config": { "service_tier": "flex" }
    }'
  ```
</CodeGroup>

<Note>
  `service_tier` is accepted on all prediction routes (<code>image</code>, <code>document</code>, <code>healthcare</code>, <code>audio</code>, <code>video</code>, <code>web</code>), on agent executions, and on OpenAI-compatible chat completions. Omitting the field (or setting it to <code>"auto"</code>, <code>"default"</code>, or <code>null</code>) falls back to the server default, which is currently <code>standard</code>.
</Note>

### Agent-Based Pricing

Agent pricing is determined by the type of agent and the tools it uses:

<table>
  <thead>
    <tr>
      <th align="left">Agent Type / Application</th>
      <th align="center">Price Per Page or Image</th>
      <th>Notes</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>All Redaction & Edit Domains<br /><code>healthcare.phi-redaction</code>, <code>healthcare.phi-edit-replace</code>, <code>insurance.document-redaction</code>, etc.</td>
      <td align="center">8</td>
      <td>Flat rate for all domains that perform PHI redaction or PHI edit-replace, regardless of detail level or options.</td>
    </tr>

    <tr>
      <td>Other Agent Applications</td>
      <td align="center">Varies</td>

      <td>
        The credit cost is based on the number of tools (sub-models or APIs) called by the agent during processing.<br />

        <ul>
          <li>Each tool call incurs an additional credit cost.</li>
          <li>Final cost = <strong>Base agent cost</strong> + <strong>Tool cost × Number of tools used</strong></li>
        </ul>
      </td>
    </tr>
  </tbody>
</table>

<Note>
  All PHI redaction agents are charged a fixed premium rate of <strong>8 credits per page</strong> to reflect the additional compliance and security requirements. For other agent-based applications, the total credit cost will depend on the number and type of tools invoked by the agent during execution. Please refer to the agent documentation or contact support for detailed cost breakdowns for your specific use case.
</Note>

### Examples

**Documents:**

* Invoice with `lo`: **1 credit** = \$0.01 per page
* Invoice with `auto`: **2 credits** = \$0.02 per page
* Invoice with `hi` + `grounding`: **6 credits** = \$0.06 per page

**Images:**

* Classification with `lo`: **1 credit** = \$0.01 per image

**Audio/Video:**

* 1-hour audio: **12 credits** = \$0.12
* 1-hour video: **120 credits** = \$1.20

**Custom models:**

* Fine-tuned models: **8 credits** = \$0.08 per page

**Service tiers:**

* Invoice with `auto` at `flex`: **2 × 0.5 = 1 credit** = \$0.01 per page
* Invoice with `hi` + grounding at `priority`: **6 × 1.8 = 10.8 credits** ≈ \$0.11 per page
* 1-hour video at `flex`: **120 × 0.5 = 60 credits** = \$0.60

***

## Plan comparison

Here's a full comparison of the features across all our current pricing tiers.

<table>
  <thead>
    <tr>
      <th align="left" style={{width: "22%"}}>Feature</th>
      <th align="center" style={{width: "25%"}}>Starter</th>
      <th align="center" style={{width: "25%"}}>Pro</th>
      <th align="center" style={{width: "25%"}}>Enterprise</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>Credits / Month</td>
      <td align="center">100 (sign-up bonus)</td>
      <td align="center">100,000 included</td>
      <td align="center">Unlimited</td>
    </tr>

    <tr>
      <td>Rate Limit</td>
      <td align="center">10 / min</td>
      <td align="center">100 / min</td>
      <td align="center">No limits</td>
    </tr>

    <tr>
      <td>Custom Models</td>
      <td align="center">Pre-configured only</td>
      <td align="center">Up to 5</td>
      <td align="center">Unlimited</td>
    </tr>

    <tr>
      <td>Support</td>
      <td align="center">Discord</td>
      <td align="center">Dedicated Slack</td>
      <td align="center">Dedicated Slack</td>
    </tr>

    <tr>
      <td>Data Retention</td>
      <td align="center">Basic logs</td>
      <td align="center">Zero-Data Retention</td>
      <td align="center">Zero-Data Retention</td>
    </tr>

    <tr>
      <td>Compliance</td>
      <td align="center">Basic</td>
      <td align="center">BAA</td>
      <td align="center">SOC2, HIPAA, BAA</td>
    </tr>

    <tr>
      <td>Deployments</td>
      <td align="center">Cloud only</td>
      <td align="center">Cloud only</td>
      <td align="center">In-VPC available</td>
    </tr>

    <tr>
      <td>Service Level Agreement</td>
      <td align="center">Community support</td>
      <td align="center">Standard support</td>
      <td align="center">Custom SLAs</td>
    </tr>
  </tbody>
</table>

## FAQ

<AccordionGroup>
  <Accordion title="How does the credits-based pricing work?">
    For `document.*`, `image.*`, and `healthcare.*` domains, each API call consumes fixed credits: `lo` (1 credit), `auto` (2 credits), `hi` (4 credits), with grounding adding +2 credits. For `audio.*` domains, you pay per 5-minute segment: `lo` (1 credit per segment), `auto` (1 credit per segment), `hi` (2 credits per segment). For `video.*` domains, you pay per 5-minute segment: `lo` (10 credits per segment), `auto` (10 credits per segment), `hi` (20 credits per segment). Credits are converted to dollars at \$0.01 per credit.
  </Accordion>

  <Accordion title="What's the difference between 'lo', 'auto', and 'hi' levels?">
    For documents/images: `lo` (1 credit) offers basic processing, `auto` (2 credits) automatically optimizes quality, `hi` (4 credits) uses high-resolution processing. For audio: `lo` (1 credit per segment), `auto` (1 credit per segment), `hi` (2 credits per segment). For video: `lo` (10 credits per segment), `auto` (10 credits per segment), `hi` (20 credits per segment). `hi` is recommended for complex content with small text, detailed layouts, or when maximum accuracy is required.
  </Accordion>

  <Accordion title="When should I use visual grounding?">
    Visual grounding provides bounding boxes and confidence scores for extracted data, making it ideal for compliance, audit trails, and quality assurance. It's particularly valuable in healthcare, finance, and legal applications where data accuracy verification is critical. Note: Grounding is only available for `document.*`, `image.*`, and `healthcare.*` domains, not for `audio.*` or `video.*`.
  </Accordion>

  <Accordion title="How does audio and video pricing work?">
    Audio and video processing uses 5-minute segment pricing instead of per-call pricing. Both audio and video files are charged per 5-minute segments (300 seconds each), with a minimum of 2 segments. For audio: cost is 1 credit per segment for `lo`, 1 credit for `auto`, 2 credits for `hi`. For video: cost is 10 credits per segment for `lo`, 10 credits for `auto`, 20 credits for `hi`. Grounding is not available for audio or video processing.
  </Accordion>

  <Accordion title="What are service tiers and how do they affect pricing?">
    Every request can be routed through one of three delivery tiers by setting `service_tier` (Python SDK) / `serviceTier` (Node.js SDK) on your `GenerationConfig`:

    * **`standard`** (default) — baseline rate (1.0×) and latency.
    * **`flex`** — **50% off** (0.5× multiplier) with higher, more variable latency. Best for batch or background workloads.
    * **`priority`** — 1.8× premium for the lowest queue times and highest reliability. Best for latency-sensitive, user-facing workloads.

    The multiplier is applied uniformly to the domain, grounding, and agent/tool credits. Omitting the field (or passing `"auto"` / `null`) uses the server default, which is currently `standard`. See the [Service Tiers](#service-tiers) section above for code examples.
  </Accordion>

  <Accordion title="Can I upgrade or downgrade my plan anytime?">
    Yes! You can upgrade or downgrade your plan at any time. Changes take effect immediately, and we'll prorate any billing adjustments.
  </Accordion>

  <Accordion title="What happens if I exceed my plan limits?">
    If you exceed your monthly credit limit, you'll be charged our pay-as-you-scale rates for the additional usage. You can set usage alerts to monitor your consumption. Free plan users can purchase additional credits as needed.
  </Accordion>

  <Accordion title="Why do custom models cost more?">
    Custom requested models and fine-tuned applications require specialized computational resources and processing pipelines. They're charged at a fixed premium rate of 6 credits per call regardless of detail level to ensure consistent, high-quality performance for specialized use cases.
  </Accordion>

  <Accordion title="What payment methods do you accept?">
    We bill using Stripe, and accept all major credit cards (Visa, MasterCard, American Express) and can arrange invoice billing for Enterprise customers.
  </Accordion>
</AccordionGroup>

## Get started

Choose your plan and start building with VLM Run today. Need help deciding? [Book a demo](https://cal.com/team/vlm-run/demo) with our team.

<CardGroup cols={2}>
  <Card title="Start Building" icon="rocket" href="https://app.vlm.run/signup">
    Sign up for free and get 100 credits free to start prototyping.
  </Card>

  <Card title="Talk to Sales" icon="phone" href="https://cal.com/team/vlm-run/demo">
    Discuss Enterprise plans and custom pricing options.
  </Card>
</CardGroup>
