Trinity Mini
arcee-ai/trinity-miniTrinity Mini (arcee-ai/trinity-mini) is a afmoe 26.1B-parameter model from Arcee Ai with a 131,072-token context window and 131,072 max output tokens, priced at $0.04/1M input and $0.15/1M output tokens. Available via the haimaker.ai OpenAI-compatible API.
Overview
src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png" alt="Arcee Trinity Mini" style="max-width: 100%; height: auto;" >
Model Card
Trinity Mini
Trinity Mini is an Arcee AI 26B MoE model with 3B active parameters. It is the medium-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.
This model is tuned for reasoning, but in testing, it uses a similar total token count to competitive instruction-tuned models.
*Trinity Mini is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.
Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.
More details, including key architecture decisions, can be found on our blog here
Try it out now at chat.arcee.ai
*Model Details
- Model Architecture: AfmoeForCausalLM
- Parameters: 26B, 3B active
- Experts: 128 total, 8 active, 1 shared
- Context length: 128k
- Training Tokens: 10T
- License: OpenMDW-1.1
- Recommended settings:
- temperature: 0.15
- top_k: 50
- top_p: 0.75
- min_p: 0.06
Benchmarks
Running our model
Transformers
Use the main transformers branch
git clone https://github.com/huggingface/transformers.git
cd transformers
pip
pip install '.[torch]'
uv
uv pip install '.[torch]'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.5,
top_k=50,
top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
If using a released transformers, simply pass "trust_remote_code=True":
model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
VLLM
Supported in VLLM release 0.11.1
# pip
pip install "vllm>=0.11.1"
Serving the model with suggested settings:
vllm serve arcee-ai/Trinity-Mini \
--dtype bfloat16 \
--enable-auto-tool-choice \
--reasoning-parser deepseek_r1 \
--tool-call-parser hermes
llama.cpp
Supported in llama.cpp release b7061
Download the latest llama.cpp release
llama-server -hf arcee-ai/Trinity-Mini-GGUF:q4_k_m \
--temp 0.15 \
--top-k 50 \
--top-p 0.75
--min-p 0.06
LM Studio
Supported in latest LM Studio runtime
Update to latest available, then verify your runtime by:
Then, go to Model Search and search for arcee-ai/Trinity-Mini-GGUF, download your prefered size, and load it up in the chat
API
Trinity Mini is available today on openrouter:
https://openrouter.ai/arcee-ai/trinity-mini
curl -X POST "https://openrouter.ai/v1/chat/completions" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "arcee-ai/trinity-mini",
"messages": [
{
"role": "user",
"content": "What are some fun things to do in New York?"
}
]
}'
License
Trinity-Mini is released under the OpenMDW-1.1 license.
Features & Capabilities
| Mode | chat |
| Context Window | 131,072 tokens |
| Max Output | 131,072 tokens |
| Function Calling | Supported |
| Vision | Not supported |
| Reasoning | Supported |
| Web Search | Not supported |
| Url Context | Not supported |
Technical Details
| Architecture | AfmoeForCausalLM |
| Model Type | afmoe |
| Base Model | arcee-ai/Trinity-Mini-Base |
| Languages | en, es, fr, de, it, pt, ru, ar, hi, ko, zh |
| Library | transformers |
API Usage
from openai import OpenAI
client = OpenAI(
base_url="https://api.haimaker.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="arcee-ai/trinity-mini",
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
)
print(response.choices[0].message.content)Frequently Asked Questions
What is the context window of Trinity Mini?
Trinity Mini (arcee-ai/trinity-mini) has a 131,072-token context window and supports up to 131,072 output tokens per request.
How much does Trinity Mini cost?
Trinity Mini is priced at $0.04 per 1M input tokens and $0.15 per 1M output tokens when accessed via the haimaker.ai OpenAI-compatible API.
What features does Trinity Mini support?
Trinity Mini supports function calling, reasoning.
How do I use Trinity Mini via API?
Send requests to https://api.haimaker.ai/v1/chat/completions with model "arcee-ai/trinity-mini" using any OpenAI-compatible SDK. Authentication uses a Bearer API key from https://app.haimaker.ai.
Use Trinity Mini with the haimaker API
OpenAI-compatible endpoint. Start building in minutes.