Haimaker.ai Logo

Trinity Mini

arcee-ai/trinity-mini
Chatapache-2.0
Arcee Ai|
Function CallingReasoning
|Released Dec 2025 · Updated Dec 2025

Trinity Mini (arcee-ai/trinity-mini) is a afmoe model from Arcee Ai with a 131,072-token context window and 131,072 max output tokens, priced at $0.04/1M input and $0.15/1M output tokens. Available via the haimaker.ai OpenAI-compatible API.

Context Window
131K
tokens
Max Output
131K
tokens
Input Price
$0.04
/1M tokens
Output Price
$0.15
/1M tokens

Overview

src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png" alt="Arcee Trinity Mini" style="max-width: 100%; height: auto;" >

Model Card

Arcee Trinity Mini

Trinity Mini

Trinity Mini is an Arcee AI 26B MoE model with 3B active parameters. It is the medium-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.

This model is tuned for reasoning, but in testing, it uses a similar total token count to competitive instruction-tuned models.

*

Trinity Mini is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.

Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.

More details, including key architecture decisions, can be found on our blog here

Try it out now at chat.arcee.ai

*

Model Details

  • Model Architecture: AfmoeForCausalLM
  • Parameters: 26B, 3B active
  • Experts: 128 total, 8 active, 1 shared
  • Context length: 128k
  • Training Tokens: 10T
  • License: Apache 2.0
  • Recommended settings:
  • temperature: 0.15
  • top_k: 50
  • top_p: 0.75
  • min_p: 0.06
*

Benchmarks

Powered by Datology

Running our model

Transformers

Use the main transformers branch

git clone https://github.com/huggingface/transformers.git
cd transformers

pip

pip install '.[torch]'

uv

uv pip install '.[torch]'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)

messages = [
{"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)

outputs = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.5,
top_k=50,
top_p=0.95
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

If using a released transformers, simply pass "trust_remote_code=True":

model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

VLLM

Supported in VLLM release 0.11.1

# pip
pip install "vllm>=0.11.1"

Serving the model with suggested settings:

vllm serve arcee-ai/Trinity-Mini \
  --dtype bfloat16 \
  --enable-auto-tool-choice \
  --reasoning-parser deepseek_r1 \
  --tool-call-parser hermes

llama.cpp

Supported in llama.cpp release b7061

Download the latest llama.cpp release

llama-server -hf arcee-ai/Trinity-Mini-GGUF:q4_k_m \
  --temp 0.15 \
  --top-k 50 \
  --top-p 0.75
  --min-p 0.06

LM Studio

Supported in latest LM Studio runtime

Update to latest available, then verify your runtime by:

  • Click "Power User" at the bottom left
  • Click the green "Developer" icon at the top left
  • Select "LM Runtimes" at the top
  • Refresh the list of runtimes and verify that the latest is installed
  • Then, go to Model Search and search for arcee-ai/Trinity-Mini-GGUF, download your prefered size, and load it up in the chat

    API

    Trinity Mini is available today on openrouter:

    https://openrouter.ai/arcee-ai/trinity-mini

    curl -X POST "https://openrouter.ai/v1/chat/completions" \
      -H "Authorization: Bearer $OPENROUTER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "arcee-ai/trinity-mini",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
      }'

    License

    Trinity-Mini is released under the Apache-2.0 license.

    Features & Capabilities

    Modechat
    Context Window131,072 tokens
    Max Output131,072 tokens
    Function CallingSupported
    Vision-
    ReasoningSupported
    Web Search-
    Url Context-

    Technical Details

    ArchitectureAfmoeForCausalLM
    Model Typeafmoe
    Base Modelarcee-ai/Trinity-Mini-Base
    Languagesen, es, fr, de, it, pt, ru, ar, hi, ko, zh
    Librarytransformers

    API Usage

    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://api.haimaker.ai/v1",
        api_key="YOUR_API_KEY",
    )
    
    response = client.chat.completions.create(
        model="arcee-ai/trinity-mini",
        messages=[
            {"role": "user", "content": "Hello, how are you?"}
        ],
    )
    
    print(response.choices[0].message.content)

    Frequently Asked Questions

    What is the context window of Trinity Mini?

    Trinity Mini (arcee-ai/trinity-mini) has a 131,072-token context window and supports up to 131,072 output tokens per request.

    How much does Trinity Mini cost?

    Trinity Mini is priced at $0.04 per 1M input tokens and $0.15 per 1M output tokens when accessed via the haimaker.ai OpenAI-compatible API.

    What features does Trinity Mini support?

    Trinity Mini supports function calling, reasoning.

    How do I use Trinity Mini via API?

    Send requests to https://api.haimaker.ai/v1/chat/completions with model "arcee-ai/trinity-mini" using any OpenAI-compatible SDK. Authentication uses a Bearer API key from https://app.haimaker.ai.

    Use Trinity Mini with the haimaker API

    OpenAI-compatible endpoint. Start building in minutes.

    Get API Access

    More from Arcee Ai