arcee-ai/trinity-miniTrinity Mini (arcee-ai/trinity-mini) is a afmoe model from Arcee Ai with a 131,072-token context window and 131,072 max output tokens, priced at $0.04/1M input and $0.15/1M output tokens. Available via the haimaker.ai OpenAI-compatible API.
src="https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/i-v1KyAMOW_mgVGeic9WJ.png" alt="Arcee Trinity Mini" style="max-width: 100%; height: auto;" >
Trinity Mini is an Arcee AI 26B MoE model with 3B active parameters. It is the medium-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.
This model is tuned for reasoning, but in testing, it uses a similar total token count to competitive instruction-tuned models.
*Trinity Mini is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.
Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.
More details, including key architecture decisions, can be found on our blog here
Try it out now at chat.arcee.ai
*
Use the main transformers branch
git clone https://github.com/huggingface/transformers.git
cd transformers
pip
pip install '.[torch]'
uv
uv pip install '.[torch]'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.5,
top_k=50,
top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
If using a released transformers, simply pass "trust_remote_code=True":
model_id = "arcee-ai/Trinity-Mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
Supported in VLLM release 0.11.1
# pip
pip install "vllm>=0.11.1"
Serving the model with suggested settings:
vllm serve arcee-ai/Trinity-Mini \
--dtype bfloat16 \
--enable-auto-tool-choice \
--reasoning-parser deepseek_r1 \
--tool-call-parser hermes
Supported in llama.cpp release b7061
Download the latest llama.cpp release
llama-server -hf arcee-ai/Trinity-Mini-GGUF:q4_k_m \
--temp 0.15 \
--top-k 50 \
--top-p 0.75
--min-p 0.06
Supported in latest LM Studio runtime
Update to latest available, then verify your runtime by:
Then, go to Model Search and search for arcee-ai/Trinity-Mini-GGUF, download your prefered size, and load it up in the chat
Trinity Mini is available today on openrouter:
https://openrouter.ai/arcee-ai/trinity-mini
curl -X POST "https://openrouter.ai/v1/chat/completions" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "arcee-ai/trinity-mini",
"messages": [
{
"role": "user",
"content": "What are some fun things to do in New York?"
}
]
}'
Trinity-Mini is released under the Apache-2.0 license.
| Mode | chat |
| Context Window | 131,072 tokens |
| Max Output | 131,072 tokens |
| Function Calling | Supported |
| Vision | - |
| Reasoning | Supported |
| Web Search | - |
| Url Context | - |
| Architecture | AfmoeForCausalLM |
| Model Type | afmoe |
| Base Model | arcee-ai/Trinity-Mini-Base |
| Languages | en, es, fr, de, it, pt, ru, ar, hi, ko, zh |
| Library | transformers |
from openai import OpenAI
client = OpenAI(
base_url="https://api.haimaker.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="arcee-ai/trinity-mini",
messages=[
{"role": "user", "content": "Hello, how are you?"}
],
)
print(response.choices[0].message.content)Trinity Mini (arcee-ai/trinity-mini) has a 131,072-token context window and supports up to 131,072 output tokens per request.
Trinity Mini is priced at $0.04 per 1M input tokens and $0.15 per 1M output tokens when accessed via the haimaker.ai OpenAI-compatible API.
Trinity Mini supports function calling, reasoning.
Send requests to https://api.haimaker.ai/v1/chat/completions with model "arcee-ai/trinity-mini" using any OpenAI-compatible SDK. Authentication uses a Bearer API key from https://app.haimaker.ai.
OpenAI-compatible endpoint. Start building in minutes.