Current as of April 2026. GPT-3.5 Turbo Instruct is a completion-style model optimized for direct instruction following rather than conversational chat. It provides high-speed execution for developers who need deterministic tool-triggering without the overhead of chat-tuned personas.
Specs
| Provider | OpenAI |
| Input cost | $1.50 / M tokens |
| Output cost | $2.00 / M tokens |
| Context window | 4K tokens |
| Max output | 4K tokens |
| Parameters | N/A |
| Features | Standard chat |
What it’s good at
Low Latency Execution
It processes simple tool-use commands faster than many modern chat models by skipping conversational filler. This is ideal for Hermes tasks like immediate shell command execution or quick platform-to-platform routing.
Strict Instruction Adherence
The instruct-tuning makes it less prone to deviating from system prompts in short-burst tasks. It follows the exact formatting required for Hermes’ 47 built-in tools when context remains narrow.
Where it falls short
Critically Small Context
The 4,000-token window is a massive liability for autonomous agents. Hermes will lose its cross-session memory and tool history almost immediately during complex runs.
Poor Price-to-Performance Ratio
At $1.50 per million input tokens, it is significantly more expensive than GPT-4o-mini while being vastly less intelligent. It lacks the reasoning depth needed for complex MCP protocol handling.
Best use cases with Hermes Agent
- Simple Message Routing — Moving data between a monitoring tool and a Telegram channel requires minimal context and benefits from the model’s high speed.
- One-Off Shell Commands — It handles direct ‘run this’ instructions efficiently without trying to turn the interaction into a long-form conversation.
Not ideal for
- Persistent Identity Management — The 4K context window cannot sustain a consistent persona or memory loop across multiple messaging platforms over time.
- Complex MCP Tool Chains — It lacks the reasoning capability to manage multiple tool dependencies or resolve errors in long autonomous loops.
Hermes Agent setup
You must use the completion API endpoint instead of the chat endpoint. Manual prompt engineering is required to ensure Hermes’ tool-use syntax is correctly formatted in the absence of a system message role.
Hermes makes custom endpoints easy. Run:
hermes model
Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:
- Base URL:
https://api.haimaker.ai/v1 - Model:
openai/gpt-3.5-turbo-instruct
Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.
How it compares
- vs GPT-4o-mini — GPT-4o-mini is 10x cheaper at $0.15/M input and provides a 128K context window, making it superior for almost every Hermes use case.
- vs Claude 3 Haiku — Haiku offers better multi-platform reasoning and a 200K context window for $0.25/M input, far outclassing this model’s 4K limit.
Bottom line
This is a legacy model that only makes sense for high-speed, single-turn instructions where context doesn’t matter. For autonomous agents, the 4K window is a dealbreaker.
TRY GPT-3.5 TURBO INSTRUCT IN HERMES
For more, see our Hermes local-LLM setup guide.