What are the token limits for GPT-5 Chat?

GPT-5 Chat offers a 128K input context window and a 16K max output limit per individual request.

How much does it cost to run on Hermes?

Input is priced at $1.25 per million tokens and output is significantly higher at $10 per million tokens.

Does it support vision-based tools?

Yes, it includes native vision support which Hermes uses for analyzing UI states or images sent via messaging platforms like Telegram.

GPT-5 Chat for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-5 Chat is the premium choice for Hermes Agent deployments requiring extreme reliability across 47+ tools and multi-platform messaging. It excels at maintaining a consistent identity through long-running autonomous loops where cheaper models often drift or hallucinate tool parameters.

Specs


Provider	OpenAI
Input cost	$1.25 / M tokens
Output cost	$10 / M tokens
Context window	128K tokens
Max output	16K tokens
Parameters	N/A
Features	vision, web_search

What it’s good at

Tool-Use Precision

It handles complex MCP protocol calls with fewer failures than GPT-4o, making it ideal for chaining shell commands and database lookups in a single run.

Memory Retention

The model utilizes the 128K context window effectively to maintain persistent persona and cross-session memory without losing the thread of the conversation.

Where it falls short

Prohibitive Output Pricing

At $10 per million tokens, output is 2x more expensive than GPT-4o and 3.3x more than Claude 3.5 Sonnet, which adds up quickly in autonomous loops.

Response Latency

There is a noticeable delay in response time compared to smaller models, which can make real-time Discord or Telegram interactions feel sluggish.

Best use cases with Hermes Agent

Cross-Platform Automation — It can monitor Slack, process complex logic, and post formatted updates to Discord without losing context or mixing up platform-specific formatting.
Long-Running Autonomous Tasks — The high reasoning capabilities ensure the closed learning loop in Hermes stays focused on the objective over several hours of operation.

Not ideal for

Simple Notification Relays — Using a $10/1M output model to push basic alerts is a waste of resources when GPT-4o-mini handles these tasks for a fraction of the cost.
High-Velocity Chat — The processing overhead makes it less suitable for fast-paced messaging environments where sub-second response times are expected by users.

Hermes Agent setup

Map the vision features to Hermes screenshot tools and keep temperature low, around 0.3, to maximize tool-call accuracy during long autonomous runs.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-5-chat

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3.5 Sonnet — Claude is faster and significantly cheaper for output at $3/1M, but GPT-5 handles the Hermes tool-calling schema with higher consistency in multi-step workflows.
vs GPT-4o — GPT-4o is better for simple chat bots at $5/1M output, but GPT-5 is necessary for complex reasoning involving the full 47-tool suite.

Bottom line

GPT-5 Chat is the most reliable engine for autonomous Hermes agents if you can justify the $10/1M output cost for high-stakes automation.

TRY GPT-5 CHAT IN HERMES

For more, see our Hermes local-LLM setup guide.