What is the cost per million tokens?

Both input and output are priced at $10 per million tokens.

How large is the context window?

It features a 400K token context window with a 128K token output maximum.

Does it support vision and web search?

Yes, it natively supports vision, function calling, and web search for real-time data retrieval within Hermes Agent.

GPT-5 Image for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-5 Image is OpenAI’s vision-centric powerhouse designed for heavy multi-modal reasoning. With a massive 400K context window and 128K output limit, it handles the long-running autonomous loops Hermes Agent requires across 15+ messaging platforms.

Specs


Provider	OpenAI
Input cost	$10 / M tokens
Output cost	$10 / M tokens
Context window	400K tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning, web_search

What it’s good at

Tool-Use Precision

It nails function calling for the 47 built-in Hermes tools even when the 400K context window gets crowded with platform history.

Native Vision Integration

The model processes screenshots from Discord or Slack natively, allowing the agent to see UI changes or shared images during automation tasks.

Contextual Persistence

The 400K context window ensures the closed learning loop and long-term memory do not degrade during complex, multi-day cross-session tasks.

Where it falls short

High Operating Cost

At $10 per million tokens for both input and output, running this model 24/7 on an autonomous agent is significantly more expensive than competitors.

Reasoning Latency

The reasoning overhead and large output capacity can lead to slower response times when triggered by high-frequency messaging platforms like WhatsApp.

Best use cases with Hermes Agent

Visual Dashboard Monitoring — Use this when your Hermes Agent needs to monitor visual dashboards on Slack and execute shell commands via MCP based on visual state.
Deep Multi-Platform Reasoning — Its 128K output limit and 400K context make it ideal for deep reasoning tasks that span weeks of platform interactions and persistent memory.

Not ideal for

Simple Message Relaying — It is a waste of $10/M tokens to bridge WhatsApp and Telegram messages without utilizing the vision or reasoning features.
High-Frequency Micro-Tasks — The cost and slight latency make it overkill for simple, repetitive tool triggers that do not require visual input or complex reasoning.

Hermes Agent setup

Ensure your OpenAI API key has Tier 5 access to handle the rate limits required for a 400K context window. Configure the Hermes model_id to openai/gpt-5-image and set the max_tokens to 128,000 for long-form reasoning logs.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5-image

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3.5 Sonnet — Claude is cheaper for input at $3/M, but GPT-5 Image offers double the context window (400K vs 200K) and superior native vision for complex UI tasks.
vs Gemini 1.5 Pro — Gemini offers a larger 2M context window, but GPT-5 Image’s function calling reliability within the Hermes MCP protocol is more consistent in autonomous runs.

Bottom line

GPT-5 Image is the premier choice for Hermes Agent users who need high-reliability tool use and visual reasoning, provided your budget can handle the $10/M token price point.

TRY GPT-5 IMAGE IN HERMES

For more, see our Hermes local-LLM setup guide.