Code Execution

Orion-2 agents (vlmrun-orion-2) write and execute Python code in a secure, sandboxed environment. Instead of invoking tools one at a time (Orion-1), Orion-2 composes CV operations into multi-step pipelines — detect, crop, annotate, measure, and transform — all within a single execute_code call.

When to Use Orion-2

Scenario	Recommended
Simple captioning, single detection, Q&A	Orion-1
Multi-step pipelines (detect → crop → annotate)	Orion-2
Custom data transformations with numpy/matplotlib	Orion-2
Iterative code refinement across turns	Orion-2
Skill-based extraction with programmatic logic	Orion-2

How It Works

Orion-2 is a visual agent harness: a planner and a code runtime wrapped around a vision-language model. It accepts text, images, video, and documents, compiles each request into an executable program, and dispatches visual tools and code execution from a single harness.

Prompt → Spec: An ambiguous request is compiled into an exact, executable program written in a visual DSL that reads like idiomatic Python.
Execution: The program runs in a sandboxed runtime with async-native parallelism — independent operations dispatch concurrently via asyncio, with no per-step model round-trips.
Self-correction: Execution results return to the harness, which repairs and re-executes until the program runs to completion.

Read the full Orion-2 blog post for architecture details, benchmarks, and live examples. To run Orion-2 from a Mastra agent with hybrid client and server tools, see Mastra Compatibility.

Orion-1 vs. Orion-2

The difference is clearest on a concrete task. Consider a virtual try-on that composes detection, cropping, and image generation across two input images. Orion-1 — sequential tool-calling, one LLM round-trip per tool:

# Tools are called sequentially, with LLM reasoning at each step
boxes      = tool_call("detect", image, target="person")                   # call 1
person     = tool_call("crop", image, xywh=[0.22, 0.35, 0.04, 0.15])      # call 2
garment    = tool_call("detect", dress_img, target="garment")              # call 3
garment    = tool_call("crop", dress_img, xywh=[0.33, 0.41, 0.05, 0.13])  # call 4
result     = tool_call("generate", person, garment)                        # call 5

Orion-2 — code-mode, one program with parallel dispatch:

import asyncio

async def process(ctx, person_image, dress_img):
    vlmrun = ctx.import_lib("vlmrun")

    def crop(img, d):
        bx, by, bw, bh = d["xywh"]; W, H = img.width, img.height
        return img.crop(int(by * H), int((by + bh) * H), int(bx * W), int((bx + bw) * W))

    # Detect person and garment in parallel
    p_det, g_det = await asyncio.gather(
        vlmrun.image.detect(person_image, "person"),
        vlmrun.image.detect(dress_img, "garment"),
    )
    person_crop = crop(person_image, p_det["detections"][0])
    garment_crop = crop(dress_img, g_det["detections"][0])

    # Composite the try-on
    (composite,) = await vlmrun.image.generate(
        "virtual try-on", images=[person_crop, garment_crop]
    )
    return {"composite": composite}

Available Libraries

Inside the sandbox, the agent accesses libraries through ctx.import_lib(...):

Library	Import	Capabilities
OpenCV	`ctx.import_lib("cv2")`	Classical CV operations, drawing, color conversion
NumPy	`ctx.import_lib("numpy")`	Array operations, math, linear algebra
Matplotlib	`ctx.import_lib("matplotlib")`	Plotting, charts, visualization
VLM Run	`ctx.import_lib("vlmrun")`	Detection, OCR, captioning, segmentation, generation, video, documents, LLM text extraction
FFmpeg	`ctx.import_lib("ffmpeg")`	Video processing, frame extraction, transcoding

Standard library modules (json, math, re, pathlib, asyncio, etc.) are available via normal import statements.

VLM Run Proxy API

The vlmrun proxy provides access to the full suite of CV capabilities:

# Image operations
caption  = await vlmrun.image.caption(img, "describe this image")
dets     = await vlmrun.image.detect(img, "cars")
segments = await vlmrun.image.segment(img, "person")
points   = await vlmrun.image.point(img, "eyes")
ocr      = await vlmrun.image.ocr(img)
(gen,)   = await vlmrun.image.generate("a sunset over mountains")
recon    = await vlmrun.image.reconstruct_3d(img, mask_img, objects)

# Document operations
n        = await vlmrun.document.length(doc_path)
pages    = await vlmrun.document.get_pages(doc_path, offset=0, limit=3)
page_img = await vlmrun.document.get_page(doc_path, index=0)

# Video operations
report      = await vlmrun.video.caption(vid_path, segment_duration=60.0)
video_paths = await vlmrun.video.generate("a timelapse of clouds", resolution="720p")
result      = await vlmrun.video.segment(vid_path, prompts=["person", "car"])

# File I/O within the sandbox
content  = vlmrun.io.read_file("data.json")
vlmrun.io.write_file("output.csv", csv_content)
path     = await vlmrun.io.download("https://example.com/file.pdf")

# LLM-powered text extraction
result   = await vlmrun.llm.extract(text, json_schema=schema)
result   = await vlmrun.llm.extract(text)  # free-form JSON when no schema

vlmrun.llm.extract

Runs a text-only LLM extraction inside the sandbox. With a json_schema, it returns a validated dict matching the schema; without one, it returns free-form JSON parsed from the model response. When a skill has a schema.json file, the pipeline can read and pass it directly:

# Inside execute_code
schema = json.loads(vlmrun.io.read_file("skills/my-skill/schema.json"))
result = await vlmrun.llm.extract(raw_text, json_schema=schema)

Example: Chat Completion with Orion-2

from vlmrun.client import VLMRun

client = VLMRun(api_key="<VLMRUN_API_KEY>")

response = client.agent.completions.create(
    model="vlmrun-orion-2:auto",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Detect all cars in this image, draw bounding boxes, and count them."},
                {"type": "image_url", "image_url": {"url": "https://example.com/parking-lot.jpg"}}
            ]
        }
    ],
)

import { VlmRun } from "vlmrun";

const client = new VlmRun({
    baseURL: "https://api.vlm.run/v1",
    apiKey: "<VLMRUN_API_KEY>",
});

const response = await client.agent.completions.create({
    model: "vlmrun-orion-2:auto",
    messages: [
        {
            role: "user",
            content: [
                { type: "text", text: "Detect all cars in this image, draw bounding boxes, and count them." },
                { type: "image_url", image_url: { url: "https://example.com/parking-lot.jpg" } }
            ]
        }
    ],
});

curl -X POST https://api.vlm.run/v1/openai/chat/completions \
  -H "Authorization: Bearer <VLMRUN_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vlmrun-orion-2:auto",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Detect all cars in this image, draw bounding boxes, and count them."},
          {"type": "image_url", "image_url": {"url": "https://example.com/parking-lot.jpg"}}
        ]
      }
    ]
  }'

The agent will automatically write and execute code like:

async def process(ctx, img):
    cv2 = ctx.import_lib("cv2")
    vlmrun = ctx.import_lib("vlmrun")

    dets = await vlmrun.image.detect(img, "cars")
    W, H = img.width, img.height

    for d in dets["detections"]:
        bx, by, bw, bh = d["xywh"]
        x, y, w, h = int(bx * W), int(by * H), int(bw * W), int(bh * H)
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

    return {"count": len(dets["detections"]), "annotated_image": img}

Skills with Orion-2

When skills are attached to an Orion-2 request, the skill workspace is materialized into the session directory at <workspace>/skills/<skill-name>/. The agent can read skill resources (SKILL.md, schemas, templates) directly using vlmrun.io.read_file or cv2.imread — no special API calls needed.

# Inside execute_code, the agent can read skill resources:
skill_instructions = await vlmrun.io.read_file("skills/invoice-extraction/SKILL.md")
schema = await vlmrun.io.read_file("skills/invoice-extraction/schema.json")

Skills work with both Orion-1 and Orion-2. Orion-1 injects skill instructions into the system prompt, while Orion-2 materializes skill files into the workspace for programmatic access.

Program Execution

When an Orion-2 skill has been run at least once, the agent’s authored pipeline.py is cached inside the skill bundle. On subsequent executions, the platform can run that pipeline directly through the code-execution sandbox, bypassing the LLM agent loop entirely. This is called program execution: the cached pipeline is the compiled program that you built once and now just run.

How it works

First run (authoring): The agent plans and writes pipeline.py. The code is persisted into the skill’s stored bundle for reuse.
Subsequent runs (replay): The cached pipeline.py executes directly via CodeExecutionRunner. If execution fails or no cached pipeline exists, the system falls back to the full agent loop automatically.

Controlling execution mode

Use mode in your execution config. It accepts program (default) or agent:

from vlmrun.client import VLMRun
from vlmrun.client.types import AgentExecutionConfig

client = VLMRun(api_key="<VLMRUN_API_KEY>")

# Default: run the cached pipeline.py as a fixed program when available
response = client.agent.execute(
    inputs={"file": "https://example.com/document.pdf"},
    model="vlmrun-orion-2:pro",
    config=AgentExecutionConfig(
        mode="program",  # default
        skills=[{"skill_id": "my-skill-id"}],
    ),
)

# Force full agent loop (e.g. for authoring or debugging)
response = client.agent.execute(
    inputs={"file": "https://example.com/document.pdf"},
    model="vlmrun-orion-2:pro",
    config=AgentExecutionConfig(
        mode="agent",
        skills=[{"skill_id": "my-skill-id"}],
    ),
)

# Program (default)
curl -X POST https://api.vlm.run/v1/agent/execute \
  -H "Authorization: Bearer $VLMRUN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vlmrun-orion-2:pro",
    "inputs": { "file": { "type": "file_url", "file_url": { "url": "https://example.com/document.pdf" } } },
    "config": {
      "mode": "program",
      "skills": [{ "skill_id": "my-skill-id" }]
    }
  }'

# Force agent loop
curl -X POST https://api.vlm.run/v1/agent/execute \
  -H "Authorization: Bearer $VLMRUN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vlmrun-orion-2:pro",
    "inputs": { "file": { "type": "file_url", "file_url": { "url": "https://example.com/document.pdf" } } },
    "config": {
      "mode": "agent",
      "skills": [{ "skill_id": "my-skill-id" }]
    }
  }'

The response includes execution_mode indicating which path was taken:

{
  "execution_id": "exec_abc123",
  "status": "completed",
  "execution_mode": "program",
  "response": { ... }
}

`execution_mode`	Meaning
`program`	Ran cached `pipeline.py` directly (no LLM orchestration)
`agent`	Ran the full LLM agent loop

Program execution preserves structured output validation, grounding metadata, and billing accuracy. Billing for program runs captures sandbox and tool costs only: zero LLM orchestration tokens.

Performance

Program execution can be an order of magnitude faster than the full agent loop. In testing, a medical-referral document extraction skill completed in ~13s in program mode vs. ~160s with full agent orchestration.

Security

The code execution sandbox enforces strict security boundaries:

Import restrictions: Only allowlisted libraries (cv2, numpy, matplotlib, vlmrun, ffmpeg) via ctx.import_lib(), plus Python stdlib. Dangerous modules (os, io, shutil, importlib) are blocked at AST parse time.
Workspace confinement: All file operations are restricted to the session workspace. Symlink traversal and absolute path escapes are rejected.
Introspection blocking: Builtins like eval, exec, compile, getattr, and __import__ are blocked to prevent sandbox escape.

Model Variants

Orion-2 is model-agnostic — the same harness and runtime work with any multimodal model that has strong code generation. The default vlmrun-orion-2:auto routes each request to the best backbone for the job.

Model ID	Description
`vlmrun-orion-2:fast`	Optimized for speed and cost-efficiency
`vlmrun-orion-2:auto`	Automatically routes to the best backend for each task (default)
`vlmrun-orion-2:pro`	Most capable tier for complex multi-step workflows
`vlmrun-orion-2:qwen3.6-35b-a3b`	Open-weight Qwen 3.6 35B — strong at code generation and reasoning
`vlmrun-orion-2:gemma4-26b-a4b`	Open-weight Gemma 4 26B — strong at localization and spatial tasks
`vlmrun-orion-2:kimi-2.6`	Kimi 2.6 — strong at multi-turn dialogue and long-context tasks
`vlmrun-orion-2:gpt-5.5`	GPT-5.5 — strong at instruction following and structured output
`vlmrun-orion-2:claude-opus-4.8`	Claude Opus 4.8 — strong at nuanced reasoning and analysis

Get Started

Concepts

Image Capabilities

Document Capabilities

Video Capabilities

Pricing

Misc

When to Use Orion-2

How It Works

Orion-1 vs. Orion-2

Available Libraries

VLM Run Proxy API

vlmrun.llm.extract

Example: Chat Completion with Orion-2

Skills with Orion-2

Program Execution

How it works

Controlling execution mode

Performance

Security

Model Variants

​When to Use Orion-2

​How It Works

​Orion-1 vs. Orion-2

​Available Libraries

​VLM Run Proxy API

​vlmrun.llm.extract

​Example: Chat Completion with Orion-2

​Skills with Orion-2

​Program Execution

​How it works

​Controlling execution mode

​Performance

​Security

​Model Variants

When to Use Orion-2

How It Works

Orion-1 vs. Orion-2

Available Libraries

VLM Run Proxy API

vlmrun.llm.extract

Example: Chat Completion with Orion-2

Skills with Orion-2

Program Execution

How it works

Controlling execution mode

Performance

Security

Model Variants