Skills

Skills are modular, reusable capabilities that provide vlm-1 with domain-specific expertise for visual extraction tasks. Instead of selecting a pre-defined domain, you reference a skill by name (and optionally pin a version) and the platform automatically applies the skill’s prompt and JSON schema to your request.

Why use Skills?

Skills let you decouple what you want extracted from how the extraction is configured:

Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent)
Versionable: Pin a specific skill version for reproducible results, or use "latest" to always get the newest revision
Composable: Pass multiple skills in a single request
Flexible: Use skills as an alternative to domains, or combine them with custom schemas

Skills are an alternative to domains. When you pass skills in your request config, you do not need to specify a domain — the skill’s built-in prompt and schema are used automatically.

Skill identifiers

Each skill can be referenced in two ways:

Field	Description	Example
`skill_name`	Human-readable name for lookup	`"invoice-extraction"`
`skill_id`	Unique identifier (UUID or name string)	`"abc-123-def"`

You must provide at least one of skill_name or skill_id. When using skill_name, you can also specify a version (defaults to "latest").

Using Skills

Image generation

Extract structured data from images using a skill instead of a domain:

from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.image.generate(
    images=[Image.open("photo.jpg")],
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="invoice-extraction")]
    )
)

Document generation

Extract structured data from PDFs and other documents:

from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.document.generate(
    file=Path("invoice.pdf"),
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="invoice-extraction", version="latest")]
    )
)

Video generation

Process videos with skill-driven extraction:

from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.video.generate(
    file=Path("recording.mp4"),
    batch=True,
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="meeting-notes")]
    )
)

Audio generation

Process audio files with skill-driven extraction:

from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import GenerationConfig, AgentSkill

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.audio.generate(
    file=Path("call.mp3"),
    batch=True,
    config=GenerationConfig(
        skills=[AgentSkill(skill_name="call-summary")]
    )
)

Agent execution

Run an agent with skills to drive the extraction pipeline:

from pydantic import BaseModel, Field
from vlmrun.client import VLMRun
from vlmrun.client.types import AgentExecutionConfig, AgentSkill
from vlmrun.types import MessageContent

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

class FileInput(BaseModel):
    file: MessageContent = Field(..., description="The file to process")

response = client.agent.execute(
    inputs=FileInput(file=MessageContent(type="input_file", file_id="<file-id>")),
    config=AgentExecutionConfig(
        skills=[AgentSkill(skill_name="patient-referral", version="20260219-abc123")]
    ),
    batch=True,
)

Chat completions

Use skills in chat to provide domain-specific context:

from vlmrun.client import VLMRun

client = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")

response = client.agent.completions.create(
    model="vlmrun-orion-1:auto",
    messages=[{"role": "user", "content": "Summarize this patient referral document."}],
    skills=[{"skill_name": "patient-referral"}],
)

Version pinning

By default, version is "latest", which resolves to the most recent revision of the skill. To pin a specific version for reproducibility:

from vlmrun.client.types import AgentSkill

skill = AgentSkill(skill_name="invoice-extraction", version="20260219-abc123")

Skills vs Domains

	Domains	Skills
Lookup	Fixed string (e.g. `"document.invoice"`)	Name + version (e.g. `"invoice-extraction"` @ `"latest"`)
Schema	Pre-defined per domain	Bundled with the skill
Versioning	N/A	Explicit version pinning
Custom prompts	Via `config.prompt`	Built into the skill
Where to pass	`domain` parameter	`config.skills` list

When skills are provided and domain is omitted, the platform creates a dynamic application from the skill’s prompt and JSON schema. You can still pass domain alongside skills if needed.

AgentSkill reference

The AgentSkill object accepts the following fields:

Field	Type	Default	Description
`skill_name`	`string`	`null`	Human-readable skill name for lookup
`skill_id`	`string`	`null`	Unique identifier (UUID or name string)
`version`	`string`	`"latest"`	Skill version to use
`type`	`string`	`"vlm-run"`	Skill type

At least one of skill_name or skill_id must be provided. If both are given, skill_id takes precedence for resolution.

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

Why use Skills?

Skill identifiers

Using Skills

Image generation

Document generation

Video generation

Audio generation

Agent execution

Chat completions

Version pinning

Skills vs Domains

AgentSkill reference

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

​Why use Skills?

​Skill identifiers

​Using Skills

​Image generation

​Document generation

​Video generation

​Audio generation

​Agent execution

​Chat completions

​Version pinning

​Skills vs Domains

​AgentSkill reference

Why use Skills?

Skill identifiers

Using Skills

Image generation

Document generation

Video generation

Audio generation

Agent execution

Chat completions

Version pinning

Skills vs Domains

AgentSkill reference