What is the exact context limit for this model?

The context window is strictly 4,000 tokens for both input and output combined, which is extremely restrictive for autonomous agents.

How much does it cost to run Hermes with this model?

Input tokens cost $1.00 per million and output tokens cost $2.00 per million, making it more expensive than many newer, more capable models.

GPT-3.5 Turbo (older v0613) for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-3.5 Turbo 0613 is a legacy workhorse that pioneered formal function calling, but its 4K context window is a massive bottleneck for modern Hermes Agent workflows. It remains fast and predictable for basic tool triggers across platforms like Slack or Telegram, though it lacks the depth for complex reasoning.

Specs


Provider	OpenAI
Input cost	$1.00 / M tokens
Output cost	$2.00 / M tokens
Context window	4K tokens
Max output	4K tokens
Parameters	N/A
Features	function_calling

What it’s good at

Reliable Function Calling

This specific 0613 version was the first to specialize in structured tool outputs, ensuring Hermes tools trigger without frequent syntax errors.

High Throughput

It processes simple automation tasks almost instantly, providing the low latency required for responsive chat-based agents.

Where it falls short

Tiny Context Window

With only 4K tokens, Hermes will lose the history of long conversations or complex MCP tool definitions very quickly.

Poor Reasoning

It struggles with multi-step logic and often fails when a task requires coordinating between three or more tools in a single run.

Best use cases with Hermes Agent

Simple Notification Routing — It is perfect for monitoring a Slack channel and posting a filtered summary to Discord without needing deep context or memory.
Basic Shell Operations — The model handles straightforward commands like file listing or process monitoring reliably when the output doesn’t exceed a few hundred tokens.

Not ideal for

Persistent Memory Loops — The 4K limit means the Hermes closed learning loop will overwrite critical session data within minutes of active multi-platform use.
Complex MCP Integrations — Modern MCP servers often have verbose schemas that consume the entire context window before the agent even begins its reasoning step.

Hermes Agent setup

Supply your OpenAI API key and explicitly set the model ID to gpt-3.5-turbo-0613 to prevent the system from defaulting to newer versions with different steering behaviors.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-3.5-turbo-0613

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — GPT-4o-mini is vastly superior with a 128K context window and cheaper pricing at $0.15 per million input tokens compared to 0613’s $1.00.
vs Claude 3 Haiku — Haiku provides much better reasoning for complex multi-platform logic and handles a 200K context window for similar low-latency performance.

Bottom line

Only use this legacy model if you have specific dependencies on the 0613 behavior; for all other Hermes Agent automation, GPT-4o-mini is a more efficient and cost-effective choice.

TRY GPT-3.5 TURBO (OLDER V0613) IN HERMES

For more, see our Hermes local-LLM setup guide.