Retrieve generated images, videos, audio, and documents from agent responses
Artifacts are binary objects generated during agent interactions, such as images, videos, audio files, and documents. When agents perform operations like image generation, face blurring, video trimming, or document processing, the results are stored as artifacts that can be retrieved using object references.
Agent responses return object references (refs) instead of raw binary data. Each reference is a string identifier that follows a specific format: a 3-5 letter type prefix followed by an underscore and a 6-digit hexadecimal string (e.g., img_a1b2c3).
Artifact Type
Prefix
Reference Type
Python Return Type
Image
img_
ImageRef
PIL.Image.Image
Video
vid_
VideoRef
Path (mp4)
Audio
aud_
AudioRef
Path (mp3)
Document
doc_
DocumentRef
Path (pdf)
Reconstruction
recon_
ReconRef
Path (spz)
URL
url_
UrlRef
Path (any of the above)
Array
arr_
ArrayRef
np.ndarray
Import reference types from the SDK:
Copy
from vlmrun.types import ImageRef, VideoRef, AudioRef, DocumentRef, ReconRef, UrlRef
Chat completions artifacts are scoped to a session and retrieved using the session_id returned from chat completions along with the object_id from the response.
In order to retrieve an artifact for a specific chat completion, you need to be able to reference the session_id (from the chat completion response) and object_id (from the structured JSON response - returned as Ref types).
Copy
from pathlib import Pathfrom pydantic import BaseModel, Fieldfrom PIL import Imagefrom vlmrun.client import VLMRunfrom vlmrun.types import ImageRefclient = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")# Define a response model with an ImageRef fieldclass BlurredImageResponse(BaseModel): image: ImageRef = Field(..., description="The blurred image")# Make a chat completion requestresponse = client.agent.completions.create( model="vlmrun-orion-1:auto", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Blur all the faces in this image"}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}} ] } ], response_format={ "type": "json_schema", "schema": BlurredImageResponse.model_json_schema() })# Parse the responseresult = BlurredImageResponse.model_validate_json(response.choices[0].message.content)# Retrieve the artifact using session_id and object_idblurred_image: Image.Image = client.artifacts.get( session_id=response.session_id, object_id=result.image.id)# Display or save the imageblurred_image.save("blurred_output.jpg")
Multi-modal artifacts such as videos are downloaded and cached locally as files.
For now, video artifacts can either be a VideoRef or a UrlRef (specified via VideoRef | UrlRef). We are working on supporting more artifact types in the future.
Python SDK
Copy
from pathlib import Pathfrom pydantic import BaseModel, Fieldfrom vlmrun.client import VLMRunfrom vlmrun.types import VideoRef, UrlRefclient = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")class VideoTrimResponse(BaseModel): start_time: str = Field(..., description="Start time of the trimmed segment") end_time: str = Field(..., description="End time of the trimmed segment") video: VideoRef | UrlRef = Field(..., description="The trimmed video")response = client.agent.completions.create( model="vlmrun-orion-1:auto", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Trim this video to the first 10 seconds"}, {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}} ] } ], response_format={ "type": "json_schema", "schema": VideoTrimResponse.model_json_schema() })result = VideoTrimResponse.model_validate_json(response.choices[0].message.content)# Retrieve the video artifact - returns a Path to the local filevideo_path: Path = client.artifacts.get( session_id=response.session_id, object_id=result.video.id)print(f"Video saved to: {video_path}")print(f"File size: {video_path.stat().st_size / 1024 / 1024:.2f} MB")
In order to retrieve an artifact for a specific agent execution, you need to be able to reference the execution_id (from the agent execution response) and object_id (from the structured JSON response - returned as Ref types).
Python SDK
Copy
from pydantic import BaseModel, Fieldfrom PIL import Imagefrom vlmrun.client import VLMRunfrom vlmrun.client.types import AgentExecutionResponse, AgentExecutionConfig, ImageUrlfrom vlmrun.types import ImageRef, MessageContentclient = VLMRun(base_url="https://agent.vlm.run/v1", api_key="<VLMRUN_API_KEY>")# Define typed inputs using MessageContentclass ExecutionInputs(BaseModel): image: MessageContent = Field(..., description="The input image")class ImageResponse(BaseModel): image: ImageRef = Field(..., description="The processed image")# Execute an agent with typed inputsexecution: AgentExecutionResponse = client.agent.execute( name="image/blur-image", inputs=ExecutionInputs( image=MessageContent(type="image_url", image_url=ImageUrl(url="https://example.com/photo.jpg")) ), config=AgentExecutionConfig( prompt="Blur the entire image", response_model=ImageResponse ))# Wait for completionexecution = client.executions.wait(execution.id, timeout=180)# Parse the response and retrieve the artifactresult = ImageResponse.model_validate(execution.response)image: Image.Image = client.artifacts.get( execution_id=execution.id, object_id=result.image.id)
Agent execution artifact retrieval via execution_id is being rolled out. Check the SDK release notes for availability.
The Python and Node SDKs automatically caches downloaded artifacts to avoid re-downloading the same files. Artifacts are stored in ~/.vlmrun/artifacts/{session_id}/ with filenames based on the object ID and appropriate file extension.
Copy
# First call downloads the artifactvideo_path = client.artifacts.get(session_id=session_id, object_id="vid_a1b2c3")# Subsequent calls return the cached file pathvideo_path = client.artifacts.get(session_id=session_id, object_id="vid_a1b2c3") # No download
When working with artifacts, keep these guidelines in mind:
For large artifacts like videos, the Python and Node SDKs download files to disk rather than loading them into memory. This prevents memory issues when working with large files. Always check the file size before loading video content into memory.
Use structured response models with appropriate Ref types (ImageRef, VideoRef, etc.) to ensure type safety and enable IDE autocompletion. The Python and Node SDKs will automatically handle the conversion to the appropriate Python type when retrieving artifacts.