What is the context limit for GPT-4?

The model is restricted to an 8K token context window with a 4K maximum output limit.

How much does it cost to run Hermes with GPT-4?

It costs $30 per million input tokens and $60 per million output tokens, making it one of the most expensive options available.

GPT 4 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-4 is a legacy powerhouse for Hermes Agent that offers high reliability for tool-calling but carries a premium price and a restrictive context window. It serves as a stable choice for complex multi-platform automation where execution precision is more critical than speed or cost.

Specs


Provider	OpenAI
Input cost	$30 / M tokens
Output cost	$60 / M tokens
Context window	8K tokens
Max output	4K tokens
Parameters	N/A
Features	function_calling

What it’s good at

Precise Tool Execution

This model handles the 47 built-in Hermes tools with high accuracy, rarely failing to format parameters for shell commands or messaging platform APIs.

Identity Stability

It maintains a consistent persona across different platforms like Telegram and Discord, effectively utilizing the closed learning loop to sustain its identity.

Where it falls short

Prohibitive Pricing

At $30 per million input and $60 per million output tokens, it is significantly more expensive than modern frontier models.

Restrictive Context Window

The 8K token limit is a severe bottleneck for Hermes’ persistent memory, leading to frequent truncation during long autonomous runs.

Best use cases with Hermes Agent

Critical Shell Automation — The high reliability in tool-calling makes it safer for running sensitive terminal commands on local or SSH environments where errors are costly.
Multi-Platform Logical Bridging — It excels at reasoning through complex instructions that require moving data between different protocols like Slack, WhatsApp, and Discord.

Not ideal for

Continuous Channel Monitoring — The $30/1M input cost makes it unaffordable to leave Hermes idling while watching high-volume chat channels for triggers.
Long-Session Memory Retention — The 8K context window fills up quickly when tracking multiple cross-session interactions or extensive tool logs.

Hermes Agent setup

Set the provider to OpenAI and the model ID to openai/gpt-4; ensure your account has sufficient credits to cover the high per-token costs.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-4

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o — GPT-4o is faster, cheaper at $5/$15 per 1M tokens, and offers a 128K context window compared to GPT-4’s 8K limit.
vs Claude 3.5 Sonnet — Sonnet provides superior tool-use reasoning and a 200K context window for a fraction of the cost at $3/$15 per 1M tokens.

Bottom line

GPT-4 remains a reliable workhorse for tool-heavy automation, but its 8K context and high pricing make it difficult to justify over modern models like GPT-4o.

TRY GPT 4 IN HERMES

For more, see our Hermes local-LLM setup guide.