What is the exact pricing for Gemini 3.1 Pro?

Input tokens cost $2 per million and output tokens cost $12 per million.

How many tokens can it actually remember?

It supports a 1.0 million token context window and a 66,536 token maximum output limit.

Does it support Hermes Agent's vision tools?

Yes, it has native vision features that allow it to process images sent across all 15+ supported messaging platforms.

Gemini 3.1 Pro for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. Gemini 3.1 Pro is a heavy-hitter for Hermes Agent deployments that require massive state retention across its 1.0M token context window. It is built for developers who need their agent to remember months of Discord conversations while juggling 47+ tools simultaneously.

Specs


Provider	Google
Input cost	$2.00 / M tokens
Output cost	$12 / M tokens
Context window	1.0M tokens
Max output	66K tokens
Parameters	N/A
Features	function_calling, vision, reasoning

What it’s good at

Massive Context Retention

The 1M token context window allows Hermes to maintain a truly persistent identity and memory without aggressive pruning of session history.

Native Multimodal Support

Vision capabilities mean your agent can accurately process screenshots or files sent in Slack, Discord, or Telegram and act on them via tools.

Robust Tool Orchestration

Its native function calling is reliable enough to handle complex MCP tool chains across multiple messaging platforms without losing the reasoning thread.

Where it falls short

Expensive Output Tokens

At $12 per million output tokens, long autonomous loops or verbose agent responses become significantly more expensive than competitors.

Aggressive Safety Filters

Google’s internal safety layers can occasionally trigger on benign cross-platform data, causing the agent to stall or refuse a legitimate tool call.

Context Latency

While it handles 1M tokens, the time-to-first-token increases noticeably as the Hermes memory buffer fills up past the 500k mark.

Best use cases with Hermes Agent

Cross-Platform Community Management — It can monitor 10+ channels simultaneously and maintain a coherent cross-session memory of every user interaction over several weeks.
Complex MCP Orchestration — The reasoning engine handles a large number of available tool schemas and long-running autonomous tasks without getting confused by previous tool outputs.

Not ideal for

Low-Latency Text Bots — The $2/$12 pricing and architecture are inefficient for simple, single-task bots that do not require multimodal input or deep context.
High-Volume Transactional Agents — The output costs make it cost-prohibitive for agents that generate thousands of small, repetitive messages per hour.

Hermes Agent setup

Obtain an API key from Google AI Studio and ensure your Hermes tool definitions strictly follow the OpenAPI-style schema Gemini requires for native function calling.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://generativelanguage.googleapis.com/v1beta
Model: google/gemini-3.1-pro-preview

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs Claude 3.5 Sonnet — Claude offers sharper reasoning for complex tool selection but lacks the 1M token context headroom and generous 66K output limit.
vs GPT-4o — GPT-4o provides better reliability in autonomous loops for some users, but its 128k context window feels cramped compared to Gemini’s million-token ceiling.

Bottom line

If your Hermes Agent needs to be a long-lived autonomous entity with an infinite memory and multimodal awareness, Gemini 3.1 Pro is the best choice despite the higher output pricing.

TRY GEMINI 3.1 PRO IN HERMES

For more, see our Hermes local-LLM setup guide.