What is the maximum context and output limit?

Maverick supports a 1.0M token context window and can generate up to 16K tokens in a single output.

How much does it cost to run?

Input costs $0.15 per million tokens, while output is priced at $0.60 per million tokens.

Llama 4 Maverick for OpenClaw: Pricing, Setup, and What It's Good At

Current as of March 2026. Llama 4 Maverick is Meta’s aggressive push into the 1M context window market, priced at a competitive $0.15 per million input tokens. It bridges the gap between the speed of Llama 3 and the massive context requirements of agentic workflows in OpenClaw.

Specs


Provider	Meta (Llama)
Input cost	$0.15 / M tokens
Output cost	$0.60 / M tokens
Context window	1.0M tokens
Max output	16K tokens
Parameters	N/A
Features	function_calling, vision

What it’s good at

Massive 1M Context Window

Handling 1,000,000 tokens for just $0.15 input makes it significantly cheaper than Claude 3.5 Sonnet for long-document analysis.

Native Vision Support

The vision capabilities are robust enough to handle complex UI layouts, which is essential for visual agents running in OpenClaw.

Where it falls short

Proprietary License

Unlike previous Llama models, Maverick is proprietary, which limits self-hosting flexibility and long-term ownership.

High Output Latency

When the context window is near capacity, the 16K max output can take a significant hit in terms of tokens per second.

Best use cases with OpenClaw

Multi-File Code Audits — The 1M context window allows you to feed an entire repository into OpenClaw for refactoring tasks without hitting limits.
Visual Web Scraping — Combining vision and function calling lets Maverick interact with DOM elements based on visual cues rather than just raw HTML.

Not ideal for

Low-Latency Chatbots — The overhead of the Maverick architecture makes it overkill and too slow for simple Q&A compared to Llama 3.1 8B.
Strict Privacy Requirements — Since it is proprietary and often requires third-party APIs like Haimaker, you lose the air-gapped security of local Llama 3.3 runs.

OpenClaw setup

Configure your provider in OpenClaw using the Haimaker endpoint at api.haimaker.ai/v1. Ensure your timeout settings are high enough to accommodate the 16K max output limit on long prompts.

{
  "models": {
    "mode": "merge",
    "providers": {
      "meta-llama": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-META-(LLAMA)-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "llama-4-maverick",
            "name": "Llama 4 Maverick",
            "cost": {
              "input": 0.15,
              "output": 0.6
            },
            "contextWindow": 1048576,
            "maxTokens": 16384
          }
        ]
      }
    }
  }
}

How it compares

vs Claude 3.5 Sonnet — Maverick is cheaper at $0.15/$0.60 compared to Sonnet’s $3/$15, though Sonnet still leads in complex reasoning.
vs GPT-4o-mini — Both share the $0.15 input price, but Maverick offers a 1M context window versus GPT-4o-mini’s 128K limit.

Bottom line

Maverick is the go-to model for developers who need massive context and vision on a budget, even if it means sacrificing the open-weight philosophy.

TRY LLAMA 4 MAVERICK ON HAIMAKER

For setup instructions, see our API key guide. For all available models, see the complete models guide.