What is the exact pricing for this model?

Input tokens cost $1.25 per million, while output tokens are priced at $10.00 per million.

How much information can it remember at once?

It features a 400,000 token context window, which is roughly equivalent to a 300-page book of logs and chat history.

Does it work with Hermes' 47 built-in tools?

Yes, it fully supports native function calling and MCP, making it highly reliable for executing shell commands, Docker operations, and platform-specific actions.

GPT-5 Codex for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-5 Codex is OpenAI’s high-context workhorse for Hermes, offering a 400K window that handles long-running autonomous loops without losing track of previous tool outputs. At $1.25 per million input tokens, it provides a stable foundation for agents managing complex cross-platform workflows.

Specs


Provider	OpenAI
Input cost	$1.25 / M tokens
Output cost	$10 / M tokens
Context window	400K tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Reliable Tool Execution

The function calling is rock solid, rarely failing to parse MCP schemas even when chaining multiple tools in a single turn. It consistently executes Hermes’ 47 built-in tools without the hallucinations common in smaller models.

Massive Context Retention

With a 400K context window, Hermes can maintain a dense memory of Slack threads and SSH logs spanning days of operation. This prevents the ‘memory reset’ issue where the agent forgets the original user intent during long tasks.

Multi-Platform Synthesis

It excels at synthesizing information from Discord and Telegram simultaneously to make decisions on Docker container management. The reasoning capabilities keep the agent’s identity consistent across 15+ messaging platforms.

Where it falls short

High Output Pricing

At $10 per million tokens for output, running high-frequency agents that post constantly across multiple platforms gets expensive fast. This can lead to unexpected costs if the closed learning loop becomes chatty.

Reasoning Latency

The internal reasoning overhead causes a noticeable delay in Hermes’ response time compared to more nimble models. It is not the best choice for real-time chat scenarios where sub-second latency is required.

Proprietary Constraints

As a closed model, you have zero visibility into the architecture, making it difficult to debug edge-case failures in tool-use. You are entirely dependent on OpenAI’s API stability for your autonomous infrastructure.

Best use cases with Hermes Agent

Infrastructure Automation — It monitors Slack for alerts and uses the SSH tool to fix servers while maintaining a perfect log of its actions in the 400K context. This reliability is critical for agents with shell access.
Cross-Platform Community Management — It handles complex moderation logic across Discord and WhatsApp while maintaining a consistent identity and memory of past user interactions. The reasoning capabilities ensure it follows community guidelines across different social norms.

Not ideal for

Simple Notification Bots — The $10/M output cost makes it overkill for simple Telegram responders that don’t need the 400K context. Cheaper models like GPT-4o mini are more cost-effective for basic alerts.
Local-Only Shell Scripts — If you are just running basic shell commands on a Mac, the latency and cost of GPT-5 Codex are unnecessary. Local models can handle these tasks faster without the data leaving your machine.

Hermes Agent setup

Set your MAX_TOKENS carefully in the Hermes config to avoid hitting the $10/M output ceiling on runaway autonomous loops. Ensure the MCP protocol is fully enabled as this model relies heavily on structured tool definitions to perform effectively.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5-codex

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Anthropic Claude 3.5 Sonnet — Sonnet is faster for tool-use, but GPT-5 Codex’s 400K context dwarfs Sonnet’s 200K for month-long autonomous sessions. Codex is more reliable for complex MCP protocol handling in my experience.
vs Google Gemini 1.5 Pro — Gemini offers a larger 1M+ context window, but GPT-5 Codex has more consistent function calling performance. Codex is less likely to hallucinate tool parameters when Hermes is under heavy multi-platform load.

Bottom line

GPT-5 Codex is the premium choice for complex Hermes deployments where memory persistence and reliable tool execution across platforms outweigh the high output costs.

TRY GPT-5 CODEX IN HERMES

For more, see our Hermes local-LLM setup guide.