Trinity Mini

Name: Trinity Mini
Brand: Arcee Ai
SKU: arcee-ai/trinity-mini
Price: 0.0450 USD
Availability: InStock

arcee-ai/trinity-mini

Chatother

Arcee Ai|

Function CallingReasoning

|Released Dec 2025 · Updated May 2026

Trinity Mini (arcee-ai/trinity-mini) is a afmoe 26.1B-parameter model from Arcee Ai with a 131,072-token context window and 131,072 max output tokens, priced at $0.04/1M input and $0.15/1M output tokens. Available via the haimaker.ai OpenAI-compatible API.

Parameters

26.1B

Context Window

131K

tokens

Max Output

131K

tokens

Input Price

$0.04

/1M tokens

Output Price

$0.15

/1M tokens

Overview

src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png" alt="Arcee Trinity Mini" style="max-width: 100%; height: auto;" >

Model Card

Trinity Mini

Trinity Mini is an Arcee AI 26B MoE model with 3B active parameters. It is the medium-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.

This model is tuned for reasoning, but in testing, it uses a similar total token count to competitive instruction-tuned models.

Trinity Mini is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.

Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.

More details, including key architecture decisions, can be found on our blog here

Try it out now at chat.arcee.ai

Model Details

Model Architecture: AfmoeForCausalLM
Parameters: 26B, 3B active
Experts: 128 total, 8 active, 1 shared
Context length: 128k
Training Tokens: 10T
License: OpenMDW-1.1
Recommended settings:

temperature: 0.15

top_k: 50

top_p: 0.75

min_p: 0.06

Benchmarks

Running our model

Transformers

Use the main transformers branch

git clone https://github.com/huggingface/transformers.git
cd transformers
pip
pip install '.[torch]'
uv
uv pip install '.[torch]'

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
messages = [
    {"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)
outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.5,
    top_k=50,
    top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

If using a released transformers, simply pass "trust_remote_code=True":

model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

VLLM

Supported in VLLM release 0.11.1

# pip
pip install "vllm>=0.11.1"

Serving the model with suggested settings:

vllm serve arcee-ai/Trinity-Mini \
  --dtype bfloat16 \
  --enable-auto-tool-choice \
  --reasoning-parser deepseek_r1 \
  --tool-call-parser hermes

llama.cpp

Supported in llama.cpp release b7061

Download the latest llama.cpp release

llama-server -hf arcee-ai/Trinity-Mini-GGUF:q4_k_m \
  --temp 0.15 \
  --top-k 50 \
  --top-p 0.75
  --min-p 0.06

LM Studio

Supported in latest LM Studio runtime

Update to latest available, then verify your runtime by:

Click "Power User" at the bottom left

Click the green "Developer" icon at the top left

Select "LM Runtimes" at the top

Refresh the list of runtimes and verify that the latest is installed

Then, go to Model Search and search for arcee-ai/Trinity-Mini-GGUF, download your prefered size, and load it up in the chat

API

Trinity Mini is available today on openrouter:

https://openrouter.ai/arcee-ai/trinity-mini

curl -X POST "https://openrouter.ai/v1/chat/completions" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "arcee-ai/trinity-mini",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
  }'

License

Trinity-Mini is released under the OpenMDW-1.1 license.

Features & Capabilities

Mode	chat
Context Window	131,072 tokens
Max Output	131,072 tokens
Function Calling	Supported
Vision	Not supported
Reasoning	Supported
Web Search	Not supported
Url Context	Not supported

Technical Details

Architecture	AfmoeForCausalLM
Model Type	afmoe
Base Model	arcee-ai/Trinity-Mini-Base
Languages	en, es, fr, de, it, pt, ru, ar, hi, ko, zh
Library	transformers

API Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.haimaker.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="arcee-ai/trinity-mini",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
)

print(response.choices[0].message.content)

Frequently Asked Questions

What is the context window of Trinity Mini?

Trinity Mini (arcee-ai/trinity-mini) has a 131,072-token context window and supports up to 131,072 output tokens per request.

How much does Trinity Mini cost?

Trinity Mini is priced at $0.04 per 1M input tokens and $0.15 per 1M output tokens when accessed via the haimaker.ai OpenAI-compatible API.

What features does Trinity Mini support?

Trinity Mini supports function calling, reasoning.

How do I use Trinity Mini via API?

Send requests to https://api.haimaker.ai/v1/chat/completions with model "arcee-ai/trinity-mini" using any OpenAI-compatible SDK. Authentication uses a Bearer API key from https://app.haimaker.ai.