Current as of April 2026. The gpt-oss-20b is OpenAI’s high-efficiency reasoning model designed for high-frequency agentic tasks, priced at a low $0.03 per million input tokens. It bridges the gap between small-scale models and heavy-duty reasoning engines for Hermes Agent deployments.

Specs

ProviderOpenAI
Input cost$0.03 / M tokens
Output cost$0.11 / M tokens
Context window131K tokens
Max outputN/A tokens
ParametersN/A
Featuresfunction_calling, reasoning

What it’s good at

Reliable Tool Orchestration

Native function calling is exceptionally stable, allowing Hermes to trigger 47+ built-in tools and MCP servers without syntax errors. It maintains high accuracy when mapping user intent to specific shell commands or platform actions.

Cost-Effective Reasoning

At $0.11 per million output tokens, it provides a reasoning-capable logic layer for autonomous loops without the massive overhead of larger GPT-4 models. This makes it ideal for 24/7 monitoring across Discord and Slack.

Where it falls short

Context Constraints

The 131K context window is adequate but can become a bottleneck for Hermes instances with massive cross-session memory and long-running platform logs. It lacks the deep-context retrieval stability found in the 1M+ token models.

Logical Depth Limits

While it features a reasoning mode, the 20B parameter scale means it can struggle with highly abstract or multi-layered logic compared to the O1 series. You might see failures in complex, multi-step autonomous planning.

Best use cases with Hermes Agent

  • Multi-Platform Community Management — It efficiently triages messages from Telegram and Discord to execute moderation tools or post updates via Hermes’ persistent identity.
  • Local Infrastructure Automation — The model’s low latency and high tool reliability make it perfect for running shell commands and Docker management via MCP.

Not ideal for

  • Large-Scale Document Analysis — The 131K window is too small for agents that need to ingest and reason over thousands of pages of documentation simultaneously.
  • Highly Ambiguous Reasoning — If your Hermes workflows require deep philosophical or extremely nuanced decision-making, the 20B architecture may oversimplify the output.

Hermes Agent setup

Point your Hermes configuration to the OpenAI provider using the model ID openai/gpt-oss-20b and ensure the reasoning flag is enabled in your API call. Keep your system prompt focused on the persistent identity to maximize the 131K context utilization.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

  • Base URL: https://api.haimaker.ai/v1
  • Model: openai/gpt-oss-20b

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

  • vs GPT-4o-mini — gpt-oss-20b offers superior reasoning for tool-use logic at a similar price point, though 4o-mini has a slightly different token cost structure.
  • vs Claude 3 Haiku — gpt-oss-20b provides more consistent function calling for Hermes’ 47 tools, whereas Haiku sometimes requires more prompt engineering for complex MCP interactions.

Bottom line

For Hermes users who need a fast, reliable, and cheap autonomous agent for platform automation and tool execution, gpt-oss-20b is the current price-to-performance leader.

TRY GPT-OSS-20B IN HERMES


For more, see our Hermes local-LLM setup guide.