What is the exact context window and output limit?

Grok 4 supports a 256K token context window for inputs and a matching 256K token limit for outputs.

How much does it cost to run Hermes on Grok 4?

It costs $3.00 per 1 million input tokens and $15.00 per 1 million output tokens.

Does it support Hermes' MCP tools?

Yes, it has native function_calling support which integrates directly with Hermes' MCP protocol and 47 built-in tools.

Grok 4 for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Grok 4 is xAI’s heavy-duty model for agents that need to live in massive contexts without losing their minds. At $3 per million input tokens, it competes directly with GPT-4o but offers a significantly larger output window for long autonomous runs.

Specs


Provider	xAI
Input cost	$3.00 / M tokens
Output cost	$15 / M tokens
Context window	256K tokens
Max output	256K tokens
Parameters	N/A
Features	function_calling, web_search

What it’s good at

Massive Output Capacity

The 256K output limit is a rarity that allows Hermes to generate massive execution logs and complex multi-step plans without hitting a ceiling.

Reliable Function Calling

It handles Hermes’ 47 built-in tools with high precision, rarely hallucinating arguments even when switching between Slack and Discord contexts.

Real-time Web Integration

Native web_search features allow the agent to verify external data before executing shell commands or posting updates to messaging platforms.

Where it falls short

Premium Output Pricing

At $15 per million tokens, output is expensive for high-frequency bots that post hundreds of messages a day.

Verbose Responses

The model tends to be chatty, which can inflate token usage during simple autonomous tasks like monitoring a folder or checking a single API.

Best use cases with Hermes Agent

Cross-Platform History Analysis — The 256K context window is perfect for ingesting weeks of Telegram and Slack history to maintain a persistent identity across sessions.
Complex MCP Tool Chaining — It excels at reasoning through long sequences of tool calls required for complex automation like server migrations via SSH.

Not ideal for

Simple Notification Bots — Using a $3/$15 model for basic ‘if-this-then-that’ Discord alerts is a waste of money when cheaper models exist.
Low-Latency Triage — While fast, the overhead of the large model can be overkill for agents that just need to categorize incoming messages.

Hermes Agent setup

Configure the xAI provider with your API key and ensure the model ID is set to xai/grok-4. Set your max_tokens to at least 128K to allow the agent enough room for deep reasoning during autonomous loops.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.x.ai/v1
Model: xai/grok-4

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o — Both cost $3/$15 per million tokens, but Grok 4 offers a much larger 256K output window compared to GPT-4o’s 4K or 16K limits.
vs Claude 3.5 Sonnet — Sonnet has slightly better tool-use logic, but Grok 4’s massive context handling is superior for agents managing months of persistent memory.
vs Llama 3.1 405B — Llama is often cheaper on third-party providers, but Grok 4’s native web search and 256K output give it the edge for autonomous web-based research.

Bottom line

Grok 4 is the current heavyweight champion for Hermes users who need an agent with a massive memory and the ability to generate long, complex autonomous logs without truncation.

TRY GROK 4 IN HERMES

For more, see our Hermes local-LLM setup guide.