What is the exact pricing for GPT 5 Nano?

Input tokens are priced at $0.05 per million and output tokens are $0.4 per million.

What are the token limits for this model?

It features a 400K token context window and a 128K token maximum output limit.

Does it support vision for platform screenshots?

Yes, it has native vision support, allowing Hermes to interpret images or screenshots sent via platforms like Telegram or Slack.

GPT 5 Nano for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT 5 Nano is OpenAI’s aggressive play for the high-context agent market, offering a massive 400K context window at a fraction of the cost of flagship models. For Hermes Agent, this means maintaining deep, persistent memory across 15+ messaging platforms without hitting the usual token walls.

Specs


Provider	OpenAI
Input cost	$0.05 / M tokens
Output cost	$0.40 / M tokens
Context window	400K tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Massive 400K Context Window

Hermes can ingest months of chat history from Slack and Discord simultaneously, allowing for a truly persistent identity that doesn’t forget previous user interactions.

Aggressive Pricing

At $0.05 per million input tokens, you can run autonomous loops for days using all 47 built-in tools without worrying about a massive API bill.

Reliable MCP Integration

The model handles the Model Context Protocol (MCP) with high precision, making it excellent at coordinating tasks between local shell commands and remote messaging APIs.

Where it falls short

Proprietary Constraints

Unlike Llama-based models, you cannot run this locally on Mac or Docker; you are entirely dependent on OpenAI’s API availability and privacy policies.

Nano-Scale Reasoning

While efficient, the reasoning capabilities can stumble on complex, multi-step tool chains compared to the larger GPT-4o or o1 models.

Best use cases with Hermes Agent

Cross-Platform Community Management — The 400K context allows Hermes to monitor Telegram, Discord, and WhatsApp at once while keeping the conversation threads organized in its memory.
Autonomous Research Agents — The low $0.4 per million output cost makes it feasible to have Hermes browse the web and write long-form summaries using its built-in tools.

Not ideal for

Air-Gapped Local Automation — Hermes users requiring total data privacy on local hardware cannot use this model since it requires an active internet connection to OpenAI’s servers.
High-Stakes Logic Chains — For extremely complex tool-use logic where a single failure breaks a mission-critical workflow, the ‘Nano’ architecture lacks the depth of larger reasoning models.

Hermes Agent setup

Configure your environment variables with your OpenAI API key and set the model ID to openai/gpt-5-nano. Ensure your rate limits are high enough, as Hermes’s autonomous loops can trigger multiple tool calls per second.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5-nano

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3 Haiku — GPT 5 Nano offers double the context window (400K vs 200K) and significantly lower input costs ($0.05 vs $0.25 per million tokens).
vs Gemini 1.5 Flash — While Gemini has a larger 1M context window, GPT 5 Nano tends to be more reliable for the specific tool-calling syntax used by Hermes’s 47 built-in tools.

Bottom line

GPT 5 Nano is the current price-to-performance leader for Hermes Agent users who need massive memory and multi-platform autonomy on a budget.

TRY GPT 5 NANO IN HERMES

For more, see our Hermes local-LLM setup guide.