Orion-2 agents (vlmrun-orion-2) write and execute Python code in a secure,
sandboxed environment. Instead of invoking tools one at a time (Orion-1),
Orion-2 composes CV operations into multi-step pipelines — detect, crop,
annotate, measure, and transform — all within a single execute_code call.
When to Use Orion-2
| Scenario | Recommended |
|---|
| Simple captioning, single detection, Q&A | Orion-1 |
| Multi-step pipelines (detect → crop → annotate) | Orion-2 |
| Custom data transformations with numpy/matplotlib | Orion-2 |
| Iterative code refinement across turns | Orion-2 |
| Skill-based extraction with programmatic logic | Orion-2 |
How It Works
Orion-2 is a visual agent harness: a planner and a code runtime wrapped around a vision-language model. It accepts text, images, video, and documents, compiles each request into an executable program, and dispatches visual tools and code execution from a single harness.
- Prompt → Spec: An ambiguous request is compiled into an exact, executable program written in a visual DSL that reads like idiomatic Python.
- Execution: The program runs in a sandboxed runtime with async-native parallelism — independent operations dispatch concurrently via
asyncio, with no per-step model round-trips.
- Self-correction: Execution results return to the harness, which repairs and re-executes until the program runs to completion.
Read the full Orion-2 blog post for architecture details, benchmarks, and live examples.
Orion-1 vs Orion-2
The difference is clearest on a concrete task. Consider a virtual try-on that composes detection, cropping, and image generation across two input images.
Orion-1 — sequential tool-calling, one LLM round-trip per tool:
# Tools are called sequentially, with LLM reasoning at each step
boxes = tool_call("detect", image, target="person") # call 1
person = tool_call("crop", image, xywh=[0.22, 0.35, 0.04, 0.15]) # call 2
garment = tool_call("detect", dress_img, target="garment") # call 3
garment = tool_call("crop", dress_img, xywh=[0.33, 0.41, 0.05, 0.13]) # call 4
result = tool_call("generate", person, garment) # call 5
Orion-2 — code-mode, one program with parallel dispatch:
import asyncio
async def process(ctx, person_image, dress_img):
vlmrun = ctx.import_lib("vlmrun")
def crop(img, d):
bx, by, bw, bh = d["xywh"]; W, H = img.width, img.height
return img.crop(int(by * H), int((by + bh) * H), int(bx * W), int((bx + bw) * W))
# Detect person and garment in parallel
p_det, g_det = await asyncio.gather(
vlmrun.image.detect(person_image, "person"),
vlmrun.image.detect(dress_img, "garment"),
)
person_crop = crop(person_image, p_det["detections"][0])
garment_crop = crop(dress_img, g_det["detections"][0])
# Composite the try-on
(composite,) = await vlmrun.image.generate(
"virtual try-on", images=[person_crop, garment_crop]
)
return {"composite": composite}
Available Libraries
Inside the sandbox, the agent accesses libraries through ctx.import_lib(...):
| Library | Import | Capabilities |
|---|
| OpenCV | ctx.import_lib("cv2") | Classical CV operations, drawing, color conversion |
| NumPy | ctx.import_lib("numpy") | Array operations, math, linear algebra |
| Matplotlib | ctx.import_lib("matplotlib") | Plotting, charts, visualization |
| VLM Run | ctx.import_lib("vlmrun") | Detection, OCR, captioning, segmentation, generation, video, documents |
| FFmpeg | ctx.import_lib("ffmpeg") | Video processing, frame extraction, transcoding |
Standard library modules (json, math, re, pathlib, asyncio, etc.) are
available via normal import statements.
VLM Run Proxy API
The vlmrun proxy provides access to the full suite of CV capabilities:
# Image operations
caption = await vlmrun.image.caption(img, "describe this image")
dets = await vlmrun.image.detect(img, "cars")
segments = await vlmrun.image.segment(img, "person")
points = await vlmrun.image.point(img, "eyes")
ocr = await vlmrun.image.ocr(img)
(gen,) = await vlmrun.image.generate("a sunset over mountains")
recon = await vlmrun.image.reconstruct_3d(img, mask_img, objects)
# Document operations
n = await vlmrun.document.length(doc_path)
pages = await vlmrun.document.get_pages(doc_path, offset=0, limit=3)
page_img = await vlmrun.document.get_page(doc_path, index=0)
# Video operations
report = await vlmrun.video.caption(vid_path, segment_duration=60.0)
video_paths = await vlmrun.video.generate("a timelapse of clouds", resolution="720p")
result = await vlmrun.video.segment(vid_path, prompts=["person", "car"])
# File I/O within the sandbox
content = vlmrun.io.read_file("data.json")
vlmrun.io.write_file("output.csv", csv_content)
path = await vlmrun.io.download("https://example.com/file.pdf")
Example: Chat Completion with Orion-2
from vlmrun.client import VLMRun
client = VLMRun(api_key="<VLMRUN_API_KEY>")
response = client.agent.completions.create(
model="vlmrun-orion-2:auto",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Detect all cars in this image, draw bounding boxes, and count them."},
{"type": "image_url", "image_url": {"url": "https://example.com/parking-lot.jpg"}}
]
}
],
)
The agent will automatically write and execute code like:
async def process(ctx, img):
cv2 = ctx.import_lib("cv2")
vlmrun = ctx.import_lib("vlmrun")
dets = await vlmrun.image.detect(img, "cars")
W, H = img.width, img.height
for d in dets["detections"]:
bx, by, bw, bh = d["xywh"]
x, y, w, h = int(bx * W), int(by * H), int(bw * W), int(bh * H)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
return {"count": len(dets["detections"]), "annotated_image": img}
Skills with Orion-2
When skills are attached to an Orion-2 request, the skill workspace is materialized
into the session directory at <workspace>/skills/<skill-name>/. The agent can
read skill resources (SKILL.md, schemas, templates) directly using vlmrun.io.read_file
or cv2.imread — no special API calls needed.
# Inside execute_code, the agent can read skill resources:
skill_instructions = await vlmrun.io.read_file("skills/invoice-extraction/SKILL.md")
schema = await vlmrun.io.read_file("skills/invoice-extraction/schema.json")
Skills work with both Orion-1 and Orion-2. Orion-1 injects skill instructions into the system prompt, while Orion-2 materializes skill files into the workspace for programmatic access.
Security
The code execution sandbox enforces strict security boundaries:
- Import restrictions: Only allowlisted libraries (
cv2, numpy, matplotlib, vlmrun, ffmpeg) via ctx.import_lib(), plus Python stdlib. Dangerous modules (os, io, shutil, importlib) are blocked at AST parse time.
- Workspace confinement: All file operations are restricted to the session workspace. Symlink traversal and absolute path escapes are rejected.
- Introspection blocking: Builtins like
eval, exec, compile, getattr, and __import__ are blocked to prevent sandbox escape.
Model Variants
Orion-2 is model-agnostic — the same harness and runtime work with any multimodal model that has strong code generation. The default vlmrun-orion-2:auto routes each request to the best backbone for the job.
| Model ID | Description |
|---|
vlmrun-orion-2:fast | Optimized for speed and cost-efficiency |
vlmrun-orion-2:auto | Automatically routes to the best backend for each task (default) |
vlmrun-orion-2:pro | Most capable tier for complex multi-step workflows |
vlmrun-orion-2:qwen3.6-35b-a3b | Open-weight Qwen 3.6 35B — strong at code generation and reasoning |
vlmrun-orion-2:gemma4-26b-a4b | Open-weight Gemma 4 26B — strong at localization and spatial tasks |
vlmrun-orion-2:kimi-2.6 | Kimi 2.6 — strong at multi-turn dialogue and long-context tasks |
vlmrun-orion-2:gpt-5.5 | GPT-5.5 — strong at instruction following and structured output |
vlmrun-orion-2:claude-opus-4.8 | Claude Opus 4.8 — strong at nuanced reasoning and analysis |