Current as of April 2026. Grok models are the current price-to-performance leaders for high-volume autonomous agents. For Hermes users running persistent workflows across Telegram or Slack, the massive context windows and aggressive pricing of the xAI family allow for extensive tool-use history and cross-session memory without the massive overhead of other providers.
The quick answer
| Model | Input / Output | Context | Best For |
|---|---|---|---|
| Grok 4.1 Fast | $0.20 / $0.50 | 2M | The Context King for Long-Running Agents |
| Grok 4 Fast | $0.20 / $0.50 | 2M | The Redundant Baseline |
| Grok Code Fast | $0.20 / $1.50 | 256K | The High-Volume Output Specialist |
| Grok 3 Mini | $0.30 / $0.50 | 131K | The Reliable Tool-Call Specialist |
| Grok 3 Mini Fast | $0.60 / $4.00 | 131K | The Reliable Tool-Call Specialist |
| Grok 2 | $2.00 / $10 | 131K | The Proven Legacy Workhorse |
| Grok 2 Vision | $2.00 / $10 | 33K | The Proven Legacy Workhorse |
| Grok 4.20 | $2.00 / $6.00 | 2M | The Heavy-Duty Reasoning Engine |
Start with Grok 4.1 Fast unless you have a specific reason to pick another. It offers a massive 2M token context window at a rock-bottom price of $0.20 per million input tokens. This is the most economical way to keep months of agent interaction history in-context for persistent Hermes sessions.
Grok 4.1 Fast — The Context King for Long-Running Agents
This is the best choice for Hermes agents that need to track long-running conversations across multiple platforms. With a 2M token context window and pricing at $0.2/M input and $0.5/M output, it allows the agent to ingest massive amounts of data from tools and MCP servers without hitting memory limits or breaking the bank.
Grok 4 Fast — The Redundant Baseline
Grok 4 Fast is nearly identical to 4.1 Fast in both pricing ($0.2/$0.5) and context (2M). Prefer 4.1 Fast for its newer optimizations; use this model only as a fallback if you encounter specific version-related regressions in tool-calling reliability or API availability.
Grok Code Fast — The High-Volume Output Specialist
Despite the name, this model is valuable for Hermes agents that need to generate massive text outputs, like long-form reports or extensive log summaries, thanks to its 256K max output cap. While output is more expensive at $1.5/M, the 256K context and reasoning capabilities handle complex tool chains well.
Grok 3 Mini — The Reliable Tool-Call Specialist
At $0.3/M input and $0.5/M output, this is slightly more expensive on the input side than the 4-series Fast models but offers highly reliable function calling for Hermes’ 47+ tools. The 131K context is more than enough for daily agent tasks that don’t require massive document ingestion.
Grok 3 Mini Fast — The Reliable Tool-Call Specialist
At $0.3/M input and $0.5/M output, this is slightly more expensive on the input side than the 4-series Fast models but offers highly reliable function calling for Hermes’ 47+ tools. The 131K context is more than enough for daily agent tasks that don’t require massive document ingestion.
Grok 2 — The Proven Legacy Workhorse
Grok 2 is significantly more expensive at $2/M input and $10/M output. Its only use case in Hermes is for users who have highly specific, legacy system prompts tuned to its specific logic patterns; otherwise, the 4-series offers more context for a fraction of the cost.
Grok 2 Vision — The Proven Legacy Workhorse
Grok 2 is significantly more expensive at $2/M input and $10/M output. Its only use case in Hermes is for users who have highly specific, legacy system prompts tuned to its specific logic patterns; otherwise, the 4-series offers more context for a fraction of the cost.
Grok 4.20 — The Heavy-Duty Reasoning Engine
When ‘Fast’ models fail to navigate complex multi-step reasoning in Hermes, 4.20 is the solution. It costs $2/M input and $6/M output but maintains the 2M context window, making it the most powerful option for agents that need to synthesize data from multiple MCP tools simultaneously.
Setup in Hermes Agent
To integrate Grok with Hermes, run hermes model and select ‘Custom endpoint’. Use https://api.x.ai/v1 as the base URL and enter your xAI API key. Ensure you specify the exact model identifier, such as xai/grok-4.1-fast, to match the billing tier you want.
Running through haimaker.ai
Rather than standing up a per-provider account, you can point Hermes at haimaker.ai and get access to Grok alongside every other frontier model through one API key:
- Base URL:
https://api.haimaker.ai/v1 - Model:
xai/grok-4.1-fast
Direct provider setup
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.x.ai/v1 - Model:
xai/grok-4.1-fast
Hermes stores the selection and uses it for all subsequent agent runs. You can also set HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
Bottom line
For the majority of Hermes Agent deployments, Grok 4.1 Fast provides the best balance of a massive 2M context window and extremely low $0.2/M input pricing, making it the top choice for autonomous, multi-platform agents.
RUN GROK IN HERMES WITH HAIMAKER
See our Hermes local-LLM setup guide.