What are the token limits for GPT 5.4?

The model features a 1.1 million token context window and supports a maximum output of 128,000 tokens per request.

How much does it cost to run?

Input tokens are priced at $2.5 per million and output tokens are $15 per million.

Does it support Hermes vision tools?

Yes, it fully supports vision, allowing the agent to process images sent via Discord or Telegram to inform its tool-use decisions.

GPT 5.4 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT 5.4 is the powerhouse choice for Hermes Agent users who prioritize massive context and flawless tool execution over budget. It handles the 1.1M token window with a level of reasoning stability that keeps long-running autonomous loops from degrading during multi-platform sessions.

Specs


Provider	OpenAI
Input cost	$2.50 / M tokens
Output cost	$15 / M tokens
Context window	1.1M tokens
Max output	128K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Reliable Tool Orchestration

It manages the 47 built-in Hermes tools and custom MCP servers without the syntax errors or parameter hallucinations common in smaller models.

Massive Context Retention

With a 1.1M token window, the agent maintains a persistent identity and deep memory of past interactions across Telegram, Slack, and Discord.

Robust Multi-Platform Reasoning

The model effectively synthesizes information from different messaging channels, allowing it to coordinate complex tasks between Slack and WhatsApp effortlessly.

Where it falls short

Premium Pricing

At $15 per million output tokens, running this model for high-volume background monitoring can quickly become cost-prohibitive for hobbyist setups.

Response Latency

The reasoning overhead introduces a noticeable delay that makes real-time chat interactions feel sluggish compared to faster, smaller-parameter models.

Best use cases with Hermes Agent

Cross-Platform Automation — It excels at monitoring one platform like Slack and executing multi-step shell or SSH commands based on that specific context.
Long-Term Autonomous Research — The 1.1M context ensures the learning loop remains closed and historical data is never lost during multi-week autonomous tasks.

Not ideal for

Basic Webhook Forwarding — Using a model with $2.5/M input costs just to move text from one API to another is a waste of resources.
Latency-Sensitive Chatbots — Users on platforms like WhatsApp might find the reasoning time frustrating for simple, direct queries.

Hermes Agent setup

Set your MAX_TOKENS environment variable high to take advantage of the 128K output limit and ensure your API key is at least Tier 4 to avoid rate limits during intensive tool-use loops.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5.4

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3.5 Sonnet — GPT 5.4 is cheaper on input ($2.5 vs $3 per million) and offers a context window over five times larger than Sonnet’s 200K limit.
vs Gemini 1.5 Pro — While Gemini offers a similar context size, GPT 5.4 provides more consistent reliability when handling the complex MCP protocol required by Hermes.

Bottom line

If your Hermes Agent needs to remember every interaction and never fail a tool call, GPT 5.4 is the only serious choice despite the premium price tag.

TRY GPT 5.4 IN HERMES

For more, see our Hermes local-LLM setup guide.