What is the context window size?

o3-mini supports a 200,000 token context window with a maximum output of 100,000 tokens.

How much does it cost to run?

Input tokens cost $1.10 per million and output tokens (including reasoning tokens) cost $4.40 per million.

Does it support tool calling?

Yes, it has native support for function calling and is highly effective at using Hermes' 47+ built-in tools.

O3 Mini for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. OpenAI’s o3-mini is a reasoning-focused model designed to handle complex logic at a fraction of the cost of flagship models. For Hermes Agent users, it provides a stable brain for orchestrating multi-platform tasks and managing 47+ built-in tools without the hallucinations common in smaller models.

Specs


Provider	OpenAI
Input cost	$1.10 / M tokens
Output cost	$4.40 / M tokens
Context window	200K tokens
Max output	100K tokens
Parameters	N/A
Features	function_calling, reasoning

What it’s good at

Reasoning-Backed Tool Use

The model uses its internal thought process to validate tool parameters before execution, significantly reducing errors when interacting with MCP servers or shell commands.

Large Context for Long Sessions

A 200K context window allows Hermes to maintain a deep memory of long Slack threads or complex cross-platform workflows without losing the original intent.

Cost-to-Intelligence Ratio

At $1.10 per million input tokens, it delivers reasoning capabilities that rival much more expensive models, making autonomous runs affordable.

Where it falls short

Thinking Latency

The internal reasoning phase introduces a delay that can make real-time messaging on platforms like WhatsApp or Telegram feel slow to the end user.

Token Overhead

Reasoning tokens are billed at the output rate of $4.40 per million, which can lead to unexpected costs if the model over-thinks simple tasks.

Best use cases with Hermes Agent

Multi-Platform Orchestration — It excels at logic-heavy tasks like monitoring a Discord channel to trigger specific shell scripts or Modal deployments based on complex criteria.
MCP Protocol Management — The reasoning architecture ensures that complex Model Context Protocol requests are formatted correctly, which is vital for Hermes’ tool-heavy ecosystem.

Not ideal for

Simple Chatbot Interactivity — Using a reasoning model for basic ‘hello’ responses on Telegram is a waste of both time and money due to the thinking delay.
High-Volume Trivial Tasks — For simple data entry or basic notification relaying, GPT-4o-mini is significantly cheaper and faster.

Hermes Agent setup

Configure the max_completion_tokens carefully to ensure the model has enough room for both internal reasoning and the final tool-call output.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/o3-mini

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o — GPT-4o is faster for conversational tasks but o3-mini is far more reliable for complex, multi-step autonomous tool chains.
vs Claude 3.5 Sonnet — Sonnet offers better prose for messaging, but o3-mini’s reasoning tokens give it an edge in following strict logic for shell and SSH operations.

Bottom line

O3-mini is the best choice for Hermes users who need a reliable, logic-driven agent for complex automation across platforms and don’t mind a few seconds of latency.

TRY O3 MINI IN HERMES

For more, see our Hermes local-LLM setup guide.