What is the exact pricing for GPT-5.1-Codex?

Input tokens cost $1.25 per million and output tokens cost $10 per million.

How large is the context window for Hermes memory?

The model supports up to 400,000 tokens in the context window with a 128,000 token output limit.

Does it support image processing from platforms like Discord?

Yes, it has full vision support, allowing Hermes to interpret screenshots or files sent via any of the 15+ supported messaging channels.

GPT-5.1-Codex for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-5.1-Codex is OpenAI’s high-reasoning model optimized for complex tool execution, despite the coding-centric branding. For Hermes Agent, it provides a massive 400K context window and reliable function calling that excels in multi-step autonomous workflows across different messaging platforms.

Specs


Provider	OpenAI
Input cost	$1.25 / M tokens
Output cost	$10 / M tokens
Context window	400K tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Reliable Tool Execution

The model handles complex MCP tool chains with high precision, rarely hallucinating parameters during long autonomous runs.

Massive 400K Context Window

It maintains persistent memory across weeks of Slack and Discord logs without needing to constantly prune the message history.

Advanced Reasoning Chains

The reasoning capabilities allow it to plan multi-platform operations, such as monitoring a webhook and executing shell commands on Modal, with minimal failure.

Where it falls short

High Output Costs

At $10 per million output tokens, it is significantly more expensive than competitors for high-volume automated messaging.

Inherent Latency

The internal reasoning process adds noticeable delay, which can make real-time interactions on WhatsApp feel sluggish.

Verbose Responses

The model tends to over-explain its logic before calling a tool, which consumes unnecessary output tokens in a closed learning loop.

Best use cases with Hermes Agent

Cross-Platform Orchestration — It effectively manages workflows spanning Discord, SSH, and Docker while keeping track of complex state changes across 47 built-in tools.
Deep History Analysis — The 400K window allows Hermes to ingest months of platform data to inform its autonomous decisions without losing the thread.

Not ideal for

Simple Notification Bots — Using a $10/1M output token model for basic Telegram alerts is a waste of budget when cheaper models exist.
Low-Latency Chat — If your Hermes instance requires instant replies for human interaction, the reasoning overhead will frustrate users.

Hermes Agent setup

Configure the provider as OpenAI and set the model ID to openai/gpt-5.1-codex. You must increase the request timeout in your Hermes config to at least 60 seconds to prevent drops during complex reasoning phases.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5.1-codex

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3.5 Sonnet — Sonnet is faster and cheaper at $3/$15, but GPT-5.1-Codex handles the 400K context window with much better recall for long-term memory.
vs GPT-4o — GPT-4o is a better generalist for chat, but Codex is more rigid and reliable when executing strict MCP protocols and shell commands.

Bottom line

GPT-5.1-Codex is the premium choice for Hermes users who need rock-solid tool reliability and massive context windows, provided they can stomach the high output costs and latency.

TRY GPT-5.1-CODEX IN HERMES

For more, see our Hermes local-LLM setup guide.