What is the context window size?

It supports up to 131K tokens, which is sufficient for maintaining long-term agent memory and extensive tool execution history.

How much does it cost?

Input is priced at $0.08 per million tokens and output is $0.3 per million tokens, making it highly competitive for high-volume agents.

Does it support native tool use?

Yes, it has native function calling and reasoning features that work seamlessly with Hermes Agent's 47 built-in tools and MCP support.

gpt-oss-safeguard-20b for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. gpt-oss-safeguard-20b is a specialized OpenAI model that brings high-end reasoning to a budget price point of $0.08 per million input tokens. It excels in autonomous loops where tool-use reliability and MCP protocol adherence are more important than sheer generation speed.

Specs


Provider	OpenAI
Input cost	$0.08 / M tokens
Output cost	$0.30 / M tokens
Context window	131K tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

Tool Call Precision

The model handles complex MCP tool calls with high precision, rarely hallucinating parameters even when managing 40+ built-in tools in Hermes.

Contextual Persistence

With a 131K context window, it maintains a coherent identity across long Discord threads and multi-session Slack interactions without losing the thread.

Where it falls short

Response Latency

The internal reasoning overhead causes noticeable delays in response time compared to faster models like GPT-4o-mini, which can lag in busy Telegram channels.

Safeguard Sensitivity

The ‘safeguard’ tuning can lead to false-positive refusals when executing certain shell commands via MCP if the intent is misinterpreted as risky.

Best use cases with Hermes Agent

Cross-platform Automation — It is ideal for monitoring Slack for specific triggers and executing complex shell scripts or posting updates to Discord with high reliability.
Long-term Memory Management — The 131K context allows the agent to remember user preferences and previous tool outputs across 15+ messaging platforms over weeks of interaction.

Not ideal for

High-Velocity Chat — Latency makes it frustrating for high-velocity Telegram groups where users expect instant replies to every message.
Unfiltered Personas — The safeguard layer restricts its ability to adopt edgy or highly informal personas required for some community management roles.

Hermes Agent setup

Use the standard OpenAI provider settings in your config. Ensure you set the max_tokens to accommodate the 66K output limit if your agent generates long diagnostic reports.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-oss-safeguard-20b

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3 Haiku — Haiku is faster and cheaper for simple tasks, but gpt-oss-safeguard-20b handles multi-step tool reasoning with fewer failures in autonomous loops.
vs GPT-4o-mini — Mini is more versatile for general chat, yet this 20b model feels more stable for strict MCP protocol execution during long-running shell tasks.

Bottom line

Choose this model if you need a reliable, low-cost autonomous agent that prioritizes tool-call accuracy and logical reasoning over raw speed or creative flair.

TRY GPT-OSS-SAFEGUARD-20B IN HERMES

For more, see our Hermes local-LLM setup guide.