Introduction

At the core of Orion is a unified architecture that powers understanding, reasoning, and action across every visual modality.

Today’s frontier Vision-Language Models like GPT-5, Claude 4.5, and Gemini 2.5 Pro can describe images and answer questions, but they operate as monolithic inference engines. They generate descriptive outputs but cannot act on visual data with the precision, determinism, or compositional control required for production-grade workflows. VLM Run’s Orion family of visual agents introduces a new paradigm for agentic visual reasoning and execution. Unlike monolithic VLMs, vlmrun-orion-1 orchestrates specialized computer vision tools – OCR, detection, segmentation, keypoint localization, diffusion, and geometric analysis – to execute complex multi-step visual workflows from natural language instructions. This marks the transition from passive visual understanding to autonomous, tool-augmented visual intelligence that bridges neural perception with symbolic execution.

Agents Supported

The latest generation of VLM Run agents, the Orion family, are available in three variants: vlmrun-orion-1:fast, vlmrun-orion-1:auto, and vlmrun-orion-1:pro.

vlmrun-orion-1:fast

Our fast visual agent for simple multi-modal workflows. Optimized for speed and quick responses.

vlmrun-orion-1:auto

Automatically selects the best model variant based on your task complexity. Balanced performance and capability.

vlmrun-orion-1:pro

Our most capable visual agent for complex, multi-step workflows. Handles long tool-trajectories and advanced reasoning.

Looking to chat with VLM Run’s Orion agents? Visit chat.vlm.run.

What makes VLM Run Agents unique?

Here are some key features of VLM Run Agents that set it apart from other AI agent platforms:

Multi-Modal, Multi-Turn Reasoning

Execute complex multi-step visual workflows with adaptive context management across extended conversations.

First-class Visual AI Tools

Comprehensive suite of specialized tools across document, image, video, and multimodal processing—composable into multi-stage pipelines.

OpenAI-Compatible API

Use our OpenAI Chat Completions endpoint to interact with VLM Run’s Orion agents with just 2 lines of code change.

Enterprise-Ready

Our agents are SOC2-Type 2 and HIPAA-compliant, production-ready with automatic validation, with support for full traceability and auditability.

How is VLM Run’s Orion different from frontier models?

Unlike monolithic Vision-Language Models (VLMs like GPT-5, Claude 4.5, and Gemini 2.5), VLM Run’s Orion family of visual agents delivers comprehensive capabilities across all modalities and tasks. The table below highlights key differences that matter for building production-grade visual workflows:

	Task	`vlmrun-orion-1`	GPT-5	Gemini 2.5	Claude Sonnet 4.5	Qwen3-VL 235B-A22B
Image / Video	Understanding	✓	⚠	✓	⚠	✓
	Reasoning	✓	✗	✗	✗	✓
	Structured Outputs	✓	✓	✓	✓	✓
	Multi-modal Tool-Calling	✓	✗	✗	✗	⚠
	Specialized Skills	✓	✗	⚠	⚠	✗

Document	Understanding	✓	✓	✓	✓	✓
	Reasoning	✓	✓	✓	✓	✗
	Structured Outputs	✓	✓	✓	✓	✓
	Multi-modal Tool-Calling	✓	⚠	⚠	⚠	✗
	Specialized Skills	✓	✓	⚠	✓	✗

In the table above, we refer to Specialized Skills as tasks such as object localization, segmentation, image-generation / editing, or geometric tools typically found in specialized computer vision applications.

Key advantages for developers:

Mixed-modality Reasoning: Only VLM Run’s Orion agents provide full reasoning across images, documents, and video - critical for building multi-step visual workflows.
Multi-modal Tool-Calling: With unique tool-calling support for images, videos and documents, VLM Run’s Orion agents enable multi-modal reasoning and execution that other models cannot perform.
Production-Ready Structured Outputs: Consistent structured output support across all modalities with automatic validation and retry logic

Let’s get started!

Below you’ll find the API reference and code samples so you can start building intelligent agents for your use case. Sign up for an API key on our platform, then check out some of our cookbooks to learn how to use VLM Run Agents to build sophisticated visual AI workflows.

Chat

Chat with our visual agent direcly in your browser.

Capabilities

See the complete catalog of visual AI capabilities and tools.

API Reference

Enough talk, show me the code.

Cookbooks

Various cookbooks showcasing VLM Run Agents in action.

Get Started

Image Capabilities

Document Capabilities

Video Capabilities

Agents Supported

vlmrun-orion-1:fast

vlmrun-orion-1:auto

vlmrun-orion-1:pro

What makes VLM Run Agents unique?

Multi-Modal, Multi-Turn Reasoning

First-class Visual AI Tools

OpenAI-Compatible API

Enterprise-Ready

How is VLM Run’s Orion different from frontier models?

Let’s get started!

Chat

Capabilities

API Reference

Cookbooks

Get Started

Image Capabilities

Document Capabilities

Video Capabilities

​Agents Supported

vlmrun-orion-1:fast

vlmrun-orion-1:auto

vlmrun-orion-1:pro

​What makes VLM Run Agents unique?

Multi-Modal, Multi-Turn Reasoning

First-class Visual AI Tools

OpenAI-Compatible API

Enterprise-Ready

​How is VLM Run’s Orion different from frontier models?

​Let’s get started!

Chat

Capabilities

API Reference

Cookbooks

Agents Supported

What makes VLM Run Agents unique?

How is VLM Run’s Orion different from frontier models?

Let’s get started!