Current as of April 2026. gpt-oss-120b is a highly efficient model for Hermes Agent users who need massive context and reliable tool-calling without the flagship price tag. At $0.04 per million input tokens, it is built for long-running autonomous loops that monitor platforms like Slack and Discord 24/7.

Specs

ProviderOpenAI
Input cost$0.04 / M tokens
Output cost$0.19 / M tokens
Context window131K tokens
Max outputN/A tokens
ParametersN/A
Featuresfunction_calling, reasoning

What it’s good at

Tool-Use Reliability

It executes Hermes’ 47 built-in tools and MCP protocols with high precision, rarely failing on the JSON syntax required for complex shell commands.

Massive Context for Memory

The 131K context window allows Hermes to maintain a persistent identity and remember user preferences across weeks of multi-platform interactions.

Where it falls short

Reasoning Latency

The internal reasoning steps can cause a 2-3 second delay, which is noticeable when users expect instant replies on Telegram or WhatsApp.

Proprietary Constraints

Unlike Llama-based models, you cannot fine-tune this for specific persona quirks, leaving your agent’s personality feeling somewhat generic.

Best use cases with Hermes Agent

  • Cross-Platform Automation — It excels at monitoring a Slack channel and autonomously triggering deployments on Modal or Docker based on the conversation history.
  • Long-Term Autonomous Monitoring — The low $0.19 output cost makes it sustainable to keep an agent running indefinitely to manage persistent cross-session memory.

Not ideal for

  • Local-Only Privacy — Since this is an OpenAI-hosted model, it is not suitable for users running Hermes on isolated Mac local or private Docker setups.
  • High-Speed Chatbots — The reasoning overhead makes it poorly suited for rapid-fire messaging where sub-second response times are the priority.

Hermes Agent setup

Point your provider to OpenAI and use the model ID openai/gpt-oss-120b. Ensure your API quota is sufficient for high-frequency tool polling if you are running autonomous loops.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

  • Base URL: https://api.haimaker.ai/v1
  • Model: openai/gpt-oss-120b

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

  • vs gpt-4o-mini — gpt-oss-120b provides significantly better reasoning for complex MCP tool chains despite being in a similar low-cost tier.
  • vs Llama 3.1 70B — While Llama is better for local privacy, gpt-oss-120b offers a larger 131K context window and more stable function calling for autonomous tasks.

Bottom line

This is the best value-to-performance model for Hermes Agent users who prioritize reliable tool execution and long-term memory over local hosting.

TRY GPT-OSS-120B IN HERMES


For more, see our Hermes local-LLM setup guide.