Current as of April 2026. GPT-3.5 Turbo 0613 is a legacy workhorse that pioneered formal function calling, but its 4K context window is a massive bottleneck for modern Hermes Agent workflows. It remains fast and predictable for basic tool triggers across platforms like Slack or Telegram, though it lacks the depth for complex reasoning.
Specs
| Provider | OpenAI |
| Input cost | $1.00 / M tokens |
| Output cost | $2.00 / M tokens |
| Context window | 4K tokens |
| Max output | 4K tokens |
| Parameters | N/A |
| Features | function_calling |
What it’s good at
Reliable Function Calling
This specific 0613 version was the first to specialize in structured tool outputs, ensuring Hermes tools trigger without frequent syntax errors.
High Throughput
It processes simple automation tasks almost instantly, providing the low latency required for responsive chat-based agents.
Where it falls short
Tiny Context Window
With only 4K tokens, Hermes will lose the history of long conversations or complex MCP tool definitions very quickly.
Poor Reasoning
It struggles with multi-step logic and often fails when a task requires coordinating between three or more tools in a single run.
Best use cases with Hermes Agent
- Simple Notification Routing — It is perfect for monitoring a Slack channel and posting a filtered summary to Discord without needing deep context or memory.
- Basic Shell Operations — The model handles straightforward commands like file listing or process monitoring reliably when the output doesn’t exceed a few hundred tokens.
Not ideal for
- Persistent Memory Loops — The 4K limit means the Hermes closed learning loop will overwrite critical session data within minutes of active multi-platform use.
- Complex MCP Integrations — Modern MCP servers often have verbose schemas that consume the entire context window before the agent even begins its reasoning step.
Hermes Agent setup
Supply your OpenAI API key and explicitly set the model ID to gpt-3.5-turbo-0613 to prevent the system from defaulting to newer versions with different steering behaviors.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
openai/gpt-3.5-turbo-0613
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs GPT-4o-mini — GPT-4o-mini is vastly superior with a 128K context window and cheaper pricing at $0.15 per million input tokens compared to 0613’s $1.00.
- vs Claude 3 Haiku — Haiku provides much better reasoning for complex multi-platform logic and handles a 200K context window for similar low-latency performance.
Bottom line
Only use this legacy model if you have specific dependencies on the 0613 behavior; for all other Hermes Agent automation, GPT-4o-mini is a more efficient and cost-effective choice.
TRY GPT-3.5 TURBO (OLDER V0613) IN HERMES
For more, see our Hermes local-LLM setup guide.