What is the maximum output length?

The model supports a maximum output of 16K tokens within a 128K token context window.

How much does it cost to run?

Input tokens are $2.5 per million and output tokens are $10 per million.

GPT-4o Audio for Hermes Agent: Pricing, Setup, and What It's Good At

Current as of April 2026. GPT-4o-audio-preview is a specialized variant for Hermes users who need native voice processing without the latency of separate STT/TTS pipelines. It brings OpenAI’s top-tier tool-use reliability to audio-centric workflows across platforms like WhatsApp and Telegram.

Specs


Provider	OpenAI
Input cost	$2.50 / M tokens
Output cost	$10 / M tokens
Context window	128K tokens
Max output	16K tokens
Parameters	N/A
Features	function_calling

What it’s good at

Native Audio Reasoning

It processes tone and inflection directly, which is vital for Hermes agents that need to interpret the emotional context of voice memos.

Tool-Use Stability

It inherits the robust function-calling capabilities of the GPT-4 family, ensuring Hermes can reliably trigger its 47 built-in tools during autonomous runs.

Where it falls short

Premium Pricing

At $10 per million output tokens, it is significantly more expensive than standard models for agents that primarily process text.

Preview Limitations

As a preview model, it may face more frequent rate limits or API instability during long-running autonomous sessions compared to the stable GPT-4o branch.

Best use cases with Hermes Agent

Voice-First Messaging — Perfect for Hermes instances running on WhatsApp where users interact via voice notes rather than typing.
Accessible Automation — Enables hands-free control of shell commands and platform monitoring through direct audio input and output.

Not ideal for

Text-Only Workflows — You are paying a massive premium for audio capabilities that go unused if your agent only monitors Slack or Discord text.
High-Volume Background Tasks — The $10/M output cost makes it prohibitively expensive for persistent, high-frequency autonomous logging or monitoring.

Hermes Agent setup

Set the model ID to openai/gpt-4o-audio-preview and ensure your API key has audio modality permissions enabled. Configure Hermes to pass audio buffers directly to the model to minimize latency in voice-to-tool execution.

Hermes makes custom endpoints easy. Run:

hermes model

Choose Custom endpoint from the menu. Enter the base URL and model identifier when prompted:

Base URL: https://api.haimaker.ai/v1
Model: openai/gpt-4o-audio-preview

Hermes stores the selection and uses it for all subsequent agent runs across whatever platforms you have wired up (Telegram, Discord, Slack, etc.). Tune HERMES_STREAM_READ_TIMEOUT and related env vars if you’re hitting slow providers.

How it compares

vs GPT-4o-mini — Mini is vastly cheaper at $0.60/M output for standard tool-use but lacks the native audio reasoning required for processing voice notes directly.
vs Claude 3.5 Sonnet — Sonnet provides superior reasoning for complex MCP tool chains but requires a separate Whisper pipeline for audio, which increases total latency.

Bottom line

This is the go-to model for Hermes users building voice-activated autonomous agents, provided the budget supports the $10/M output cost.

TRY GPT-4O AUDIO IN HERMES

For more, see our Hermes local-LLM setup guide.