What are the token limits for GPT-5.4 Nano?

The model supports a 400,000 token context window and a maximum output of 128,000 tokens per request.

How much does it cost to run with Hermes?

Input tokens are priced at $0.2 per million and output tokens are $1.25 per million.

Does it support vision and tool calling?

Yes, it supports native function calling for all 47 Hermes tools and includes vision capabilities for processing multi-platform media.

GPT-5.4 Nano for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-5.4 Nano is the budget-friendly powerhouse for Hermes users who need a massive 400K context window for persistent memory without flagship costs. It balances extremely low input pricing at $0.2 per million tokens with reliable performance across 47 built-in agent tools.

Specs


Provider	OpenAI
Input cost	$0.20 / M tokens
Output cost	$1.25 / M tokens
Context window	400K tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning, web_search

What it’s good at

Massive Context Window

The 400K token limit allows Hermes to maintain months of cross-platform message history from Telegram, Discord, and Slack without losing its identity.

Aggressive Input Pricing

At $0.2 per million tokens, this model is significantly cheaper than GPT-4o for heavy ingestion of logs and persistent memory data.

Where it falls short

Output Cost Ratio

The output cost of $1.25 per million tokens is over six times the input cost, which can lead to unexpected bills for agents that generate long-form reports.

Reasoning Depth

In complex MCP tool chains involving more than five sequential steps, it occasionally loses the thread compared to the larger o-series models.

Best use cases with Hermes Agent

Cross-Platform Message Routing — It handles incoming data from 15+ messaging platforms efficiently while maintaining a consistent persona across different channels.
Persistent Memory Retrieval — The 400K context allows Hermes to search through thousands of historical interactions to find specific user preferences or past task results.

Not ideal for

Complex Multi-Step Logic — If your agent needs to perform advanced reasoning across multiple MCP tools, you will see better reliability from GPT-4o or Claude 3.5 Sonnet.
High-Volume Output Tasks — The $1.25 output price makes it less economical for agents that generate massive text files versus those that just execute commands.

Hermes Agent setup

Use the standard OpenAI provider configuration with your API key and ensure you set the max_tokens to leverage the 128K output ceiling for long-running autonomous tasks.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5.4-nano

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3 Haiku — Haiku is faster for short bursts, but GPT-5.4 Nano’s 400K context window dwarfs Haiku’s 200K, making it better for agents with long-term memory needs.
vs Gemini 1.5 Flash — Gemini offers a larger 1M context window, but GPT-5.4 Nano provides more consistent reliability with Hermes’ 47 built-in tools and MCP protocol handling.

Bottom line

This is the best budget option for autonomous agents that require massive memory capacity and reliable tool use across multiple messaging platforms.

TRY GPT-5.4 NANO IN HERMES

For more, see our Hermes local-LLM setup guide.