What is the pricing for this model?

Input costs $1.25 per million tokens and output costs $10 per million tokens.

How big is the context window?

It supports up to 1,000,000 tokens, which is enough to store thousands of messages from multiple platforms simultaneously.

Does it support vision for Hermes?

Yes, it has native vision support for interpreting images and screenshots sent via messaging platforms like Slack or Discord.

Gemini 2.5 Pro for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Gemini 2.5 Pro is the context king for Hermes Agent, offering a massive 1M token window that allows for months of conversation history without truncation. It handles multi-platform reasoning effectively, particularly when parsing images from Discord or Slack via its vision features.

Specs


Provider	Google
Input cost	$1.25 / M tokens
Output cost	$10 / M tokens
Context window	1.0M tokens
Max output	8K tokens
Parameters	N/A
Features	function_calling, vision

What it’s good at

Massive Context Window

The 1M token context window lets Hermes maintain a persistent memory bank without aggressive pruning or RAG. This is vital for agents managing long-term identities across 15+ messaging platforms.

Native Multimodal Support

Excellent vision capabilities allow Hermes to analyze screenshots or files from messaging platforms before executing shell commands or tools. It bridges the gap between visual stimuli and autonomous action.

Where it falls short

Output Constraints

The 8K output token limit can restrict Hermes if it needs to generate extensive logs or complex multi-step plans in a single turn. You may need to break down large tasks into smaller tool-call loops.

Provider Rate Limiting

Google’s API can be aggressive with rate limits on lower tiers, which can stall Hermes during intensive autonomous runs. High-frequency tool use requires monitoring your quota closely.

Best use cases with Hermes Agent

Long-term Platform Monitoring — It is perfect for keeping the entire history of a Telegram channel in the active prompt for context-aware automation without losing the agent’s persistent identity.
Visual Task Automation — Ideal for workflows where Hermes monitors visual content on Slack to trigger specific MCP tools or local shell scripts based on image data.

Not ideal for

High-Volume Simple Tasks — It is less cost-efficient than Gemini 1.5 Flash for high-frequency, low-complexity automation where the 1M context window is not required.
Latency-Sensitive Loops — The Pro model processing time is higher than smaller models, which might impact the responsiveness of an agent during live Discord interactions.

Hermes Agent setup

Obtain a Google AI Studio API key and set your model ID to google/gemini-2.5-pro. Ensure your project has high enough rate limits to prevent Hermes from stalling during autonomous tool-use loops.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://generativelanguage.googleapis.com/v1beta
Model: google/gemini-2.5-pro

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o — Gemini 2.5 Pro is cheaper at $10 vs $15 per 1M output tokens and offers a context window nearly 8x larger than GPT-4o’s 128K limit.
vs Claude 3.5 Sonnet — While Claude 3.5 Sonnet provides slightly better tool-calling precision for MCP protocols, it cannot match Gemini’s 1M token capacity for massive persistent memory.

Bottom line

Gemini 2.5 Pro is the best choice for Hermes users who prioritize massive persistent memory and multimodal reasoning over raw tool-calling speed.

TRY GEMINI 2.5 PRO IN HERMES

For more, see our Hermes local-LLM setup guide.