Current as of March 2026. UI-TARS 1.5 7B is a specialized vision-language model for UI automation. It doesn’t try to be a general-purpose assistant — it’s trained to look at a screenshot and tell you where to click. For that specific job, it outperforms general models at a tenth of the price.

Specs

ProviderByteDance
Input cost$0.10 / M tokens
Output cost$0.20 / M tokens
Context window131K tokens
Max output2K tokens
ParametersN/A
FeaturesStandard chat

What it’s good at

Spatial Accuracy for UI Elements

It produces accurate bounding boxes and click coordinates for buttons, form fields, and other UI elements. Generalist 7B models make far more spatial errors on this task.

Cost for High-Frequency Screen Polling

$0.10/M input is cheap for vision tasks. If your agent needs to process dozens of screenshots per session, the cost stays manageable.

Where it falls short

2K Output Cap

This is a hard limit. You can ask it where to click — you can’t ask it to write a paragraph about what it sees. Plan around it.

No General Knowledge

It’s purpose-trained. Ask it to help debug code or explain a concept and the responses degrade quickly compared to a general model.

Best use cases with OpenClaw

  • Automated QA Testing — Verify that UI elements are visible and positioned correctly across different screen resolutions.
  • Visual Web Scraping — Navigate dynamic pages by reading the interface visually rather than relying on CSS selectors that break when the site updates.

Not ideal for

  • Multi-step Reasoning — 7B parameters and UI-specific training means complex logical chains fall apart.
  • Anything Requiring Long Output — 2K tokens is not enough for documentation, code, or prose.

Run it through Haimaker

Skip juggling API keys. One Haimaker key gives you access to every model on the platform. Tell OpenClaw:

Add Haimaker as a custom provider to my OpenClaw config. Use these details:

- Provider name: haimaker
- Base URL: https://api.haimaker.ai/v1
- API key: [PASTE YOUR HAIMAKER API KEY HERE]
- API type: openai-completions

Add the auto-router model:
- haimaker/auto (reasoning: false, context: 128000, max tokens: 32000)

Create an alias "auto" for easy switching. Apply the config when done.

Or skip model selection entirely — Haimaker’s auto-router picks the best model for each task so you don’t have to.

OpenClaw setup

Configure your provider to point to api.haimaker.ai/v1 and set the model identifier to bytedance/ui-tars-1.5-7b. Ensure your screenshots are pre-processed to fit within the 131K context window to avoid truncation.

{
  "models": {
    "mode": "merge",
    "providers": {
      "bytedance": {
        "baseUrl": "https://api.haimaker.ai/v1",
        "apiKey": "YOUR-BYTEDANCE-API-KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "ui-tars-1.5-7b",
            "name": "UI-TARS 1.5 7B",
            "cost": {
              "input": 0.09999999999999999,
              "output": 0.2
            },
            "contextWindow": 131072,
            "maxTokens": 2048
          }
        ]
      }
    }
  }
}

How it compares

  • vs GPT-4o-mini — 4o-mini has far better general reasoning, but UI-TARS is more accurate at UI spatial tasks and roughly 1.5x cheaper on input.
  • vs Claude 3.5 Sonnet — Sonnet handles complex workflows, but UI-TARS is about 30x cheaper for pure vision input tasks. Use Sonnet for the reasoning, UI-TARS for the clicking.

Bottom line

A purpose-built tool for agents that need to interact with UIs. Don’t try to use it as a general assistant — it’s not. For UI navigation tasks specifically, it delivers high accuracy at low cost.

TRY UI-TARS 1.5 7B ON HAIMAKER


For setup instructions, see our API key guide. For all available models, see the complete models guide.