Current as of April 2026. GPT-5 Codex is OpenAI’s high-context workhorse for Hermes, offering a 400K window that handles long-running autonomous loops without losing track of previous tool outputs. At $1.25 per million input tokens, it provides a stable foundation for agents managing complex cross-platform workflows.
Specs
| Provider | OpenAI |
| Input cost | $1.25 / M tokens |
| Output cost | $10 / M tokens |
| Context window | 400K tokens |
| Max output | 128K tokens |
| Parameters | N/A |
| Features | function_calling, vision, reasoning |
What it’s good at
Reliable Tool Execution
The function calling is rock solid, rarely failing to parse MCP schemas even when chaining multiple tools in a single turn. It consistently executes Hermes’ 47 built-in tools without the hallucinations common in smaller models.
Massive Context Retention
With a 400K context window, Hermes can maintain a dense memory of Slack threads and SSH logs spanning days of operation. This prevents the ‘memory reset’ issue where the agent forgets the original user intent during long tasks.
Multi-Platform Synthesis
It excels at synthesizing information from Discord and Telegram simultaneously to make decisions on Docker container management. The reasoning capabilities keep the agent’s identity consistent across 15+ messaging platforms.
Where it falls short
High Output Pricing
At $10 per million tokens for output, running high-frequency agents that post constantly across multiple platforms gets expensive fast. This can lead to unexpected costs if the closed learning loop becomes chatty.
Reasoning Latency
The internal reasoning overhead causes a noticeable delay in Hermes’ response time compared to more nimble models. It is not the best choice for real-time chat scenarios where sub-second latency is required.
Proprietary Constraints
As a closed model, you have zero visibility into the architecture, making it difficult to debug edge-case failures in tool-use. You are entirely dependent on OpenAI’s API stability for your autonomous infrastructure.
Best use cases with Hermes Agent
- Infrastructure Automation — It monitors Slack for alerts and uses the SSH tool to fix servers while maintaining a perfect log of its actions in the 400K context. This reliability is critical for agents with shell access.
- Cross-Platform Community Management — It handles complex moderation logic across Discord and WhatsApp while maintaining a consistent identity and memory of past user interactions. The reasoning capabilities ensure it follows community guidelines across different social norms.
Not ideal for
- Simple Notification Bots — The $10/M output cost makes it overkill for simple Telegram responders that don’t need the 400K context. Cheaper models like GPT-4o mini are more cost-effective for basic alerts.
- Local-Only Shell Scripts — If you are just running basic shell commands on a Mac, the latency and cost of GPT-5 Codex are unnecessary. Local models can handle these tasks faster without the data leaving your machine.
Hermes Agent setup
Set your MAX_TOKENS carefully in the Hermes config to avoid hitting the $10/M output ceiling on runaway autonomous loops. Ensure the MCP protocol is fully enabled as this model relies heavily on structured tool definitions to perform effectively.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
openai/gpt-5-codex
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs Anthropic Claude 3.5 Sonnet — Sonnet is faster for tool-use, but GPT-5 Codex’s 400K context dwarfs Sonnet’s 200K for month-long autonomous sessions. Codex is more reliable for complex MCP protocol handling in my experience.
- vs Google Gemini 1.5 Pro — Gemini offers a larger 1M+ context window, but GPT-5 Codex has more consistent function calling performance. Codex is less likely to hallucinate tool parameters when Hermes is under heavy multi-platform load.
Bottom line
GPT-5 Codex is the premium choice for complex Hermes deployments where memory persistence and reliable tool execution across platforms outweigh the high output costs.
For more, see our Hermes local-LLM setup guide.