Current as of April 2026. OpenAI models remain the gold standard for Hermes Agent due to their superior tool-calling reliability and native support for the complex schemas required by MCP. While other families catch up, the GPT and o-series models offer the most stable performance across 15+ messaging platforms where context management and autonomous tool execution are the primary bottlenecks.

The quick answer

ModelInput / OutputContextBest For
gpt-oss-20b$0.03 / $0.11131KThe High-Volume Utility Choice
gpt-oss-120b$0.04 / $0.19131KThe Mid-Range Reasoning Alternative
GPT 5 Nano$0.05 / $0.40400KThe All-Rounder for Long-Running Agents
gpt-oss-safeguard-20b$0.08 / $0.30131KThe Compliance-Focused Variant
GPT 4.1 Nano$0.10 / $0.401.0MThe Maximum Context Specialist
GPT 4o Mini$0.15 / $0.60128KThe Reliable Tool-Calling Standard
GPT-5.4 Nano$0.20 / $1.25400KThe Search-Integrated Intelligence
GPT 5 Mini$0.25 / $2.00400KThe Premium Autonomous Engine

Start with GPT 5 Nano unless you have a specific reason to pick another. It offers the best value-to-performance ratio for a persistent agent. At $0.05/M input tokens and a 400K context window, it handles months of cross-session memory across Discord and Telegram far more affordably than the Mini or 4o variants while retaining vision and reasoning capabilities.

gpt-oss-20b — The High-Volume Utility Choice

This is the cheapest entry point at $0.03/M input tokens. It is best suited for simple, high-frequency automation tasks that do not require complex reasoning or vision, such as basic message routing or simple tool triggers.

gpt-oss-120b — The Mid-Range Reasoning Alternative

At $0.04/M input, this model provides a significant step up in logic from the 20b version. Use this if your Hermes workflows involve multi-step tool chains that the smaller OSS model struggles to sequence correctly.

GPT 5 Nano — The All-Rounder for Long-Running Agents

This model balances a massive 400K context window with a low $0.05/M input cost. It is the most efficient choice for agents that need to process images via vision and maintain deep memory across 128K output bursts.

gpt-oss-safeguard-20b — The Compliance-Focused Variant

This model is nearly identical to gpt-oss-20b; prefer the standard 20b unless your deployment explicitly requires the higher safety alignment and output filtering, which comes at a premium of $0.08/M input.

GPT 4.1 Nano — The Maximum Context Specialist

With a 1.0M context window, this is the only choice for agents managing massive, persistent message histories across multiple platforms without pruning. Its $0.1/M input cost is reasonable for the scale of data it can hold in active memory.

GPT 4o Mini — The Reliable Tool-Calling Standard

While pricier than the Nano series at $0.15/M input, its tool-calling precision is the most consistent in the industry. Choose this if your Hermes instance relies on complex MCP tools where any hallucination in JSON parameters breaks the workflow.

GPT-5.4 Nano — The Search-Integrated Intelligence

This is the best option for agents serving as research assistants. For $0.2/M input, you get native web_search capabilities and reasoning, making it more capable at autonomous information gathering than the 5 Nano.

GPT 5 Mini — The Premium Autonomous Engine

The most expensive option at $0.25/M input, but it offers the highest reasoning scores in the family. It is designed for complex, high-stakes autonomy where the agent must make nuanced decisions across its 47 built-in tools.

Setup in Hermes Agent

To configure these in Hermes, run hermes model and select ‘Custom endpoint’. Use the base URL https://api.openai.com/v1 and the model identifier provided in the list. Ensure your API key has sufficient tier credits to handle the context windows of the Nano series.

Running through haimaker.ai

Rather than standing up a per-provider account, you can point Hermes at haimaker.ai and get access to OpenAI alongside every other frontier model through one API key:

  • Base URL: https://api.haimaker.ai/v1
  • Model: openai/gpt-oss-20b

Direct provider setup

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

  • Base URL: https://api.haimaker.ai/v1
  • Model: openai/gpt-oss-20b

Hermes stores the selection and uses it for all subsequent agent runs. You can also set HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

Bottom line

For most Hermes users, GPT 5 Nano provides the perfect balance of context size and vision at a price that allows for 24/7 autonomous operation.

RUN OPENAI IN HERMES WITH HAIMAKER


See our Hermes local-LLM setup guide.