Current as of April 2026. gpt-oss-120b is a highly efficient model for Hermes Agent users who need massive context and reliable tool-calling without the flagship price tag. At $0.04 per million input tokens, it is built for long-running autonomous loops that monitor platforms like Slack and Discord 24/7.
Specs
| Provider | OpenAI |
| Input cost | $0.04 / M tokens |
| Output cost | $0.19 / M tokens |
| Context window | 131K tokens |
| Max output | N/A tokens |
| Parameters | N/A |
| Features | function_calling, reasoning |
What it’s good at
Tool-Use Reliability
It executes Hermes’ 47 built-in tools and MCP protocols with high precision, rarely failing on the JSON syntax required for complex shell commands.
Massive Context for Memory
The 131K context window allows Hermes to maintain a persistent identity and remember user preferences across weeks of multi-platform interactions.
Where it falls short
Reasoning Latency
The internal reasoning steps can cause a 2-3 second delay, which is noticeable when users expect instant replies on Telegram or WhatsApp.
Proprietary Constraints
Unlike Llama-based models, you cannot fine-tune this for specific persona quirks, leaving your agent’s personality feeling somewhat generic.
Best use cases with Hermes Agent
- Cross-Platform Automation — It excels at monitoring a Slack channel and autonomously triggering deployments on Modal or Docker based on the conversation history.
- Long-Term Autonomous Monitoring — The low $0.19 output cost makes it sustainable to keep an agent running indefinitely to manage persistent cross-session memory.
Not ideal for
- Local-Only Privacy — Since this is an OpenAI-hosted model, it is not suitable for users running Hermes on isolated Mac local or private Docker setups.
- High-Speed Chatbots — The reasoning overhead makes it poorly suited for rapid-fire messaging where sub-second response times are the priority.
Hermes Agent setup
Point your provider to OpenAI and use the model ID openai/gpt-oss-120b. Ensure your API quota is sufficient for high-frequency tool polling if you are running autonomous loops.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
openai/gpt-oss-120b
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs gpt-4o-mini — gpt-oss-120b provides significantly better reasoning for complex MCP tool chains despite being in a similar low-cost tier.
- vs Llama 3.1 70B — While Llama is better for local privacy, gpt-oss-120b offers a larger 131K context window and more stable function calling for autonomous tasks.
Bottom line
This is the best value-to-performance model for Hermes Agent users who prioritize reliable tool execution and long-term memory over local hosting.
For more, see our Hermes local-LLM setup guide.