> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vlm.run/llms.txt
> Use this file to discover all available pages before exploring further.

# Doc → JSON

> Generate structured prediction for the given document.

For all supported `document` domains, see the [Hub Catalog](/hub).

<RequestExample>
  ```python Python (with domain) theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  from pathlib import Path
  from vlmrun.client import VLMRun

  client = VLMRun(api_key="<VLMRUN_API_KEY>")
  response = client.document.generate(
      file=Path("<path>.pdf"),
      domain="<domain>"
  )
  ```

  ```python Python (with skill) theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  from pathlib import Path
  from vlmrun.client import VLMRun
  from vlmrun.client.types import GenerationConfig, AgentSkill

  client = VLMRun(api_key="<VLMRUN_API_KEY>")
  response = client.document.generate(
      file=Path("<path>.pdf"),
      config=GenerationConfig(
          skills=[AgentSkill(skill_name="<skill-name>", version="latest")]
      )
  )
  ```

  ```typescript Node.js SDK (with domain) theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  import { VlmRun } from "vlmrun";

  const client = new VlmRun({apiKey: "<VLMRUN_API_KEY>"});
  const fileResponse = await client.files.upload(
      filePath: "<path>.pdf"
  );
  const response = await client.document.generate({
      fileId: fileResponse.id,
      domain: "<domain>",
  });
  ```

  ```typescript Node.js SDK (with skill) theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  import { VlmRun } from "vlmrun";

  const client = new VlmRun({apiKey: "<VLMRUN_API_KEY>"});
  const fileResponse = await client.files.upload(
      filePath: "<path>.pdf"
  );
  const response = await client.document.generate({
      fileId: fileResponse.id,
      config: {
          skills: [{ skillName: "<skill-name>", version: "latest" }],
      },
  });
  ```
</RequestExample>


## OpenAPI

````yaml POST /v1/document/generate
openapi: 3.1.0
info:
  title: VLM Run Unified Server
  description: Unified server for VLM Run Agent and API
  termsOfService: https://vlm.run/terms-of-service
  contact:
    name: VLM Run Support Team
    url: https://vlm.run/
    email: support@vlm.run
  version: 2026-05-19.0
servers: []
security: []
paths:
  /v1/document/generate:
    post:
      tags:
        - document
        - document
      summary: Document Generate
      description: Generate structured prediction for the given document.
      operationId: document_generate_v1_document_generate_post
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DocumentFilePredictionRequest'
        required: true
      responses:
        '201':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/PredictionResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
      security:
        - HTTPBearer: []
components:
  schemas:
    DocumentFilePredictionRequest:
      properties:
        metadata:
          $ref: '#/components/schemas/RequestMetadata'
          description: Optional metadata to pass to the model.
        config:
          $ref: '#/components/schemas/GenerationConfig'
          description: The VLM generation config to be used for /<dtype>/generate.
        url:
          anyOf:
            - type: string
            - type: 'null'
          title: Url
          description: The URL of the file (provide either `file_id` or `url`).
        file_id:
          anyOf:
            - type: string
            - type: 'null'
          title: File Id
          description: The ID of the uploaded file (provide either `file_id` or `url`).
        id:
          type: string
          title: Id
          description: Unique identifier of the request.
        created_at:
          type: string
          format: date-time
          title: Created At
          description: Date and time when the request was created (in UTC timezone)
        callback_url:
          anyOf:
            - type: string
              minLength: 1
              format: uri
            - type: 'null'
          title: Callback Url
          description: The URL to call when the request is completed.
        model:
          anyOf:
            - type: string
              const: vlm-1
            - type: string
          title: Model
          description: The model to use for generating the response.
          default: vlm-1
        domain:
          anyOf:
            - type: string
              enum:
                - document.invoice
                - document.markdown
                - document.receipt
                - document.resume
                - document.us-drivers-license
                - document.layout-detection
                - construction.blueprint
                - healthcare.patient-referral
                - healthcare.patient-identification
                - healthcare.physician-order
                - healthcare.claims-processing
                - healthcare.phi-redaction
                - healthcare.phi-edit-replace
                - healthcare.lab-report
                - healthcare.prior-authorization-request
                - healthcare.explanation-of-benefits
            - type: string
            - type: 'null'
          title: Domain
          description: >-
            The domain identifier (e.g. `document.invoice`). Optional when a
            skill is provided via config.skills.
        batch:
          type: boolean
          title: Batch
          description: Whether to process the document in batch mode (async).
          default: true
      type: object
      title: DocumentFilePredictionRequest
      description: Request to the VLM API using a document (doc, docx, pptx, pdf).
    PredictionResponse:
      properties:
        usage:
          $ref: '#/components/schemas/CreditUsageResponse'
          description: The usage metrics for the request.
        id:
          type: string
          title: Id
          description: Unique identifier of the response.
        created_at:
          type: string
          format: date-time
          title: Created At
          description: Date and time when the request was created (in UTC timezone)
        completed_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Completed At
          description: Date and time when the response was completed (in UTC timezone)
        response:
          anyOf:
            - {}
            - type: 'null'
          title: Response
          description: >-
            The response from the model. May be an empty dict/list when the
            model found no extractable content (valid for status=completed).
        status:
          type: string
          enum:
            - pending
            - enqueued
            - running
            - completed
            - failed
            - paused
          title: Status
          description: The status of the job.
          default: pending
        domain:
          anyOf:
            - type: string
            - type: 'null'
          title: Domain
          description: The domain of the prediction (e.g. document.invoice, image.caption).
      type: object
      title: PredictionResponse
      description: Base prediction response for all API responses.
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    RequestMetadata:
      properties:
        environment:
          type: string
          enum:
            - dev
            - staging
            - prod
          title: Environment
          description: The environment where the request was made.
          default: dev
        session_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Session Id
          description: The session ID of the request
        allow_logging:
          type: boolean
          title: Allow Logging
          description: Whether to enable logs for this request.
          default: true
        allow_training:
          type: boolean
          title: Allow Training
          description: Whether the file can be used for training
          default: true
        allow_retention:
          type: boolean
          title: Allow Retention
          description: Whether to allow retention of the data
          default: true
        extra:
          additionalProperties: true
          type: object
          title: Extra
          description: Extra metadata for the request (e.g. `dataset_id`, `subset_id`).
      type: object
      title: RequestMetadata
      description: >-
        Metadata for the request.


        Typically captured in {"vlmrun": {"metadata": {"environment":
        <environment>, ...}}.
    GenerationConfig:
      properties:
        prompt:
          anyOf:
            - type: string
            - type: 'null'
          title: Prompt
          description: >-
            Additional user instructions appended to the application or skill
            prompt for this request.
        detail:
          type: string
          enum:
            - auto
            - hi
            - lo
          title: Detail
          description: The detail level to use for processing multimodal data.
          default: auto
        json_schema:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Json Schema
          description: >-
            The overridden JSON schema to use for the model. To be used instead
            of the response model.
        skills:
          anyOf:
            - items:
                $ref: '#/components/schemas/AgentSkill'
              type: array
            - type: 'null'
          title: Skills
          description: List of agent skills to enable for this generation request.
        gql_stmt:
          anyOf:
            - type: string
            - type: 'null'
          title: Gql Stmt
          description: >-
            The GraphQL statement to use for the application. If provided, the
            response model will be generated from the GraphQL statement.
        max_retries:
          type: integer
          title: Max Retries
          description: The maximum number of retries to use for the application.
          default: 1
        max_tokens:
          type: integer
          title: Max Tokens
          description: The maximum number of tokens to use for the application.
          default: 65535
        temperature:
          type: number
          title: Temperature
          description: The temperature to use for the application.
          default: 0
        confidence:
          type: boolean
          title: Confidence
          description: >-
            Include confidence scores in the response (included in the
            `_metadata` field).
          default: false
        grounding:
          type: boolean
          title: Grounding
          description: >-
            Include grounding in the response (included in the `_metadata`
            field).
          default: false
        keyframes:
          type: boolean
          title: Keyframes
          description: Include keyframes in the video transcription response.
          default: false
        video_segment_duration:
          anyOf:
            - type: number
              minimum: 1
            - type: 'null'
          title: Video Segment Duration
          description: >-
            Duration in seconds for each video segment when chunking a video for
            transcription. Defaults to 150.0s.
        video_frames_per_segment:
          anyOf:
            - type: integer
              minimum: 1
            - type: 'null'
          title: Video Frames Per Segment
          description: >-
            Number of frames to sample per video segment for captioning.
            Defaults to 8.
        video_model:
          anyOf:
            - type: string
            - type: 'null'
          title: Video Model
          description: >-
            Model ID to use for video segment captioning (e.g.
            'vlmrun-orion-1:fast'). When omitted, the server default is used.
        video_input_mode:
          anyOf:
            - type: string
              enum:
                - frames
                - native_video
            - type: 'null'
          title: Video Input Mode
          description: >-
            How to pass video to the captioning model: 'frames' extracts N JPEG
            frames per segment, 'native_video' sends the mp4 clip directly via
            video_url for models with native video understanding. Defaults to
            'native_video' for Qwen deployment models, 'frames' for others.
        video_transcribe_audio:
          type: boolean
          title: Video Transcribe Audio
          description: >-
            When True, transcribe the audio track to align segment boundaries.
            When False (default), skip ASR and use fixed-duration video segments
            only (visual-only captioning).
          default: false
        chat_context:
          anyOf:
            - type: string
            - type: 'null'
          title: Chat Context
          description: >-
            Plain-text chat transcript (prior turns + current request) used to
            ground video captioning / transcription on what the user wants
            extracted.
        page_indices:
          anyOf:
            - items:
                type: integer
              type: array
            - type: 'null'
          title: Page Indices
          description: >-
            0-indexed page indices to process for document files. If None, all
            pages are processed.
        use_context_cache:
          type: boolean
          title: Use Context Cache
          description: >-
            Reuse cached representations of document/video content across calls.
            When True (default), the file is cached after the first call so
            repeated queries against the same file skip re-transmitting its
            contents. Set to False to always send the full content.
          default: true
        service_tier:
          anyOf:
            - type: string
              enum:
                - auto
                - default
                - standard
                - flex
                - priority
            - type: 'null'
          title: Service Tier
          description: >-
            Delivery tier for the request. 'standard'/'default' uses baseline
            rates, 'flex' applies a 50% discount with higher latency, 'priority'
            applies a 1.8x premium. When omitted (or 'auto'), the server default
            ('standard') applies. The chosen tier drives both billing and the
            latency/availability SLO.
      type: object
      title: GenerationConfig
      description: Request configuration for image/document/video generation.
    CreditUsageResponse:
      properties:
        elements_processed:
          anyOf:
            - type: integer
            - type: 'null'
          title: Elements Processed
          description: Number of elements processed.
        element_type:
          anyOf:
            - type: string
              enum:
                - image
                - page
                - video
                - audio
            - type: 'null'
          title: Element Type
          description: The type of element processed (e.g. image, page, video, audio).
        credits_used:
          anyOf:
            - type: integer
            - type: 'null'
          title: Credits Used
          description: Amount of total credits used.
        steps:
          anyOf:
            - type: integer
            - type: 'null'
          title: Steps
          description: Number of steps processed, in case of agentic execution.
        message:
          anyOf:
            - type: string
            - type: 'null'
          title: Message
          description: The message from the credit usage job.
        duration_seconds:
          type: integer
          title: Duration Seconds
          description: Duration of the request in seconds.
          default: 0
      type: object
      title: CreditUsageResponse
      description: Response model for credit usage metrics.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
        input:
          title: Input
        ctx:
          type: object
          title: Context
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    AgentSkill:
      properties:
        type:
          type: string
          title: Type
          description: >-
            The type of the skill. Use 'skill_reference' for DB-stored skills
            referenced by id/name. Use 'inline' to provide the skill as a
            base64-encoded zip bundle.
          default: skill_reference
        skill_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Skill Id
          description: >-
            The unique identifier of the skill — a UUID or a name string (e.g.,
            'pillow', 'batch-processing').
        skill_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Skill Name
          description: >-
            Human-readable skill name for lookup (e.g., 'invoice-extraction').
            Alternative to skill_id. Deprecated in favour of skill_id.
        skill_version:
          anyOf:
            - type: integer
            - type: string
          title: Skill Version
          description: The version of the skill — an integer (e.g. 2) or 'latest'.
          default: latest
        version:
          anyOf:
            - type: integer
            - type: string
            - type: 'null'
          title: Version
          description: 'DEPRECATED: Use ''skill_version'' instead. The version of the skill.'
        name:
          anyOf:
            - type: string
            - type: 'null'
          title: Name
          description: >-
            Human-readable name for the inline skill (used for discovery and
            logging).
        description:
          anyOf:
            - type: string
            - type: 'null'
          title: Description
          description: Short description of what the inline skill does.
        source:
          anyOf:
            - $ref: '#/components/schemas/InlineSkillSource'
            - type: 'null'
          description: >-
            Source payload for inline skills. Contains the base64-encoded zip
            bundle with type, media_type, and data fields.
        bundle:
          anyOf:
            - type: string
            - type: 'null'
          title: Bundle
          description: >-
            DEPRECATED: Use 'source.data' instead. Base64-encoded zip bundle
            containing the skill files (inline skills only).
      type: object
      title: AgentSkill
      description: >-
        A modular capability that extends the agent's functionality.


        Agent Skills are reusable, filesystem-based resources that provide the
        agent

        with domain-specific expertise: workflows, context, and best practices.


        Each skill packages instructions, metadata, and optional resources
        (scripts,

        templates, snippets) that the agent uses automatically when relevant.


        Two modes are supported:


        1. **Referenced skills** (``type="skill_reference"``) – Provide
        ``skill_id``
           (UUID or name) and optionally ``skill_version`` (integer or ``"latest"``).

           .. code-block:: json

               {"type": "skill_reference", "skill_id": "pillow", "skill_version": "latest"}

        2. **Inline skills** (``type="inline"``) – Supply ``name``,
        ``description``,
           and a ``source`` object containing the base64-encoded zip bundle.  The zip
           must contain exactly one ``SKILL.md`` file.  No database lookup is required.

           .. code-block:: json

               {
                   "type": "inline",
                   "name": "csv-insights",
                   "description": "Summarize CSV files.",
                   "source": {
                       "type": "base64",
                       "media_type": "application/zip",
                       "data": "<base64-zip>"
                   }
               }

           Legacy format with flat ``bundle`` field is also accepted for backward
           compatibility.
    InlineSkillSource:
      properties:
        type:
          type: string
          const: base64
          title: Type
          description: >-
            Encoding type for the inline skill data. Currently only 'base64' is
            supported.
          default: base64
        media_type:
          type: string
          title: Media Type
          description: MIME type of the skill bundle. Must be 'application/zip'.
          default: application/zip
        data:
          type: string
          title: Data
          description: Base64-encoded zip bundle containing the skill files.
      type: object
      required:
        - data
      title: InlineSkillSource
      description: |-
        Source payload for an inline skill bundle.

        Follows the OpenAI inline skill format::

            {
                "type": "base64",
                "media_type": "application/zip",
                "data": "<base64-encoded-zip>"
            }
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

````