Ministral 3 3B Reasoning 2512

Name: Ministral 3 3B Reasoning 2512
Brand: Mistral AI
SKU: mistralai/ministral-3b-2512
Price: 0.1000 USD
Availability: InStock

mistralai/ministral-3b-2512

Chatapache-2.0

Mistral AI|

Function CallingVision

|Released Oct 2025 · Updated Jul 2026

Ministral 3 3B Reasoning 2512 (mistralai/ministral-3b-2512) is a mistral3 4.3B-parameter model from Mistral AI with a 131,072-token context window and 131,072 max output tokens, priced at $0.10/1M input and $0.10/1M output tokens. Available via the haimaker.ai OpenAI-compatible API.

Parameters

4.3B

Context Window

131K

tokens

Max Output

131K

tokens

Input Price

$0.10

/1M tokens

Output Price

$0.10

/1M tokens

Overview

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Model Card

Ministral 3 3B Reasoning 2512

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.

Learn more in our blog post and paper.

Key Features

Ministral 3 3B consists of two main architectural components:

3.4B Language Model
0.4B Vision Encoder

The Ministral 3 3B Reasoning model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
System Prompt: Maintains strong adherence and support for system prompts.
Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
Reasoning: Excels at complex, multi-step reasoning and dynamic problem-solving.
Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

Ideal for lightweight, real-time applications on edge or low-resource devices, such as:

Image captioning
Text classification
Real-time efficient translation
Data extraction
Short content generation
Fine-tuning and specialization
And more...

Bringing advanced AI capabilities to edge and distributed environments for embedded systems.

Recommended Settings

We recommend deploying with the following best practices:

System Prompt: Use our provided system prompt, and append it to your custom system prompt to define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.

Multi-turn Traces: We highly recommend keeping the reasoning traces in context.

Sampling Parameters: Use a temperature of 0.7 for most environments ; Different temperatures may be explored for different use cases - developers are encouraged to experiment with alternative settings.

Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.

Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.

Ministral 3 Family

| Model Name | Type | Precision | Link |
|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
| Ministral 3 3B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | Hugging Face |
| Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | Hugging Face |
| Ministral 3 8B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | Hugging Face |
| Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | Hugging Face |
| Ministral 3 14B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | Hugging Face |
| Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | Hugging Face |

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

| Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
|---------------------------|-------------|-------------|--------------|---------------|
| Ministral 3 14B | 0.850| 0.898| 0.712 | 0.646 |
| Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
| | | | | |
| Ministral 3 8B | 0.787 | 0.860| 0.668 | 0.616 |
| Qwen3-VL-8B-Thinking | 0.798| 0.860| 0.671 | 0.580 |
| | | | | |
| Ministral 3 3B | 0.721| 0.775| 0.534 | 0.548 |
| Qwen3-VL-4B-Thinking | 0.697 | 0.729 | 0.601 | 0.513 |

Instruct

| Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
|---------------------------|-------------|------------|-------------|------------------|
| Ministral 3 14B | 0.551| 68.5| 0.904| 8.49 |
| Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
| Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
| | | | | |
| Ministral 3 8B | 0.509 | 66.8| 0.876 | 8.08 |
| Qwen3-VL-8B-Instruct | 0.528| 66.3 | 0.946| 8.00 |
| | | | | |
| Ministral 3 3B | 0.305 | 56.8| 0.830 | 7.83 |
| Qwen3-VL-4B-Instruct | 0.438| 56.8| 0.900| 8.01 |
| Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |
| Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |

Base

| Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|
| Ministral 3 14B | 0.742 | 0.676 | 0.648 | 0.820 | 0.794 | 0.749 |
| Qwen3 14B Base | 0.754 | 0.620 | 0.661 | 0.837 | 0.804| 0.703 |
| Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | 0.788 |
| | | | | | | |
| Ministral 3 8B | 0.706 | 0.626 | 0.591 | 0.793 | 0.761| 0.681 |
| Qwen 3 8B Base | 0.700 | 0.576 | 0.596 | 0.794 | 0.760 | 0.639 |
| | | | | | | |
| Ministral 3 3B | 0.652 | 0.601 | 0.511 | 0.735 | 0.707 | 0.592 |
| Qwen 3 4B Base | 0.677 | 0.405 | 0.570 | 0.759 | 0.713| 0.530 |
| Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | 0.640 |

Usage

The model can be used with the following frameworks;

vllm: See here

transformers: See here

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 0.12.0:

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

Due to their size, Ministral-3-3B-Reasoning-2512 and Ministral-3-8B-Reasoning-2512 can run on a single 1xH200 GPU.

A simple launch command is:


vllm serve mistralai/Ministral-3-3B-Reasoning-2512 \
  --tokenizer_mode mistral --config_format mistral --load_format mistral \
  --enable-auto-tool-choice --tool-call-parser mistral \
  --reasoning-parser mistral

Key parameter notes:

enable-auto-tool-choice: Required when enabling tool usage.
tool-call-parser mistral: Required when enabling tool usage.
reasoning-parser mistral: Required when enabling reasoning.

Additional flags:

You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Recommended Sampling Settings:

We recommend starting with a Temperature of 0.7 for most use cases. Feel free to experiment with different settings to best suit your specific needs.

Usage of the model

Here we assume that the model mistralai/Ministral-3-3B-Reasoning-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Vision Reasoning

Let's see if the Ministral 3 model knows when to pick a fight !

from typing import Any
from openai import OpenAI
from huggingface_hub import hf_hub_download
Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.7
TOP_P = 0.95
MAX_TOK = 262144
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
index_begin_think = system_prompt.find("[THINK]")
    index_end_think = system_prompt.find("[/THINK]")
return {
        "role": "system",
        "content": [
            {"type": "text", "text": system_prompt[:index_begin_think]},
            {
                "type": "thinking",
                "thinking": system_prompt[
                    index_begin_think + len("[THINK]") : index_end_think
                ],
                "closed": True,
            },
            {
                "type": "text",
                "text": system_prompt[index_end_think + len("[/THINK]") :],
            },
        ],
    }
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
messages = [
    SYSTEM_PROMPT,
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]
stream = client.chat.completions.create(
    model=model,
    messages=messages,
    stream=True,
    temperature=TEMP,
    top_p=TOP_P,
    max_tokens=MAX_TOK,
)
print("client: Start streaming chat completions...:\n")
printed_reasoning_content = False
answer = []
for chunk in stream:
    reasoning_content = None
    content = None
    # Check the content is reasoning_content or content
    if hasattr(chunk.choices[0].delta, "reasoning_content"):
        reasoning_content = chunk.choices[0].delta.reasoning_content
    if hasattr(chunk.choices[0].delta, "content"):
        content = chunk.choices[0].delta.content
if reasoning_content is not None:
        if not printed_reasoning_content:
            printed_reasoning_content = True
            print("Start reasoning:\n", end="", flush=True)
        print(reasoning_content, end="", flush=True)
    elif content is not None:
        # Extract and print the content
        if not reasoning_content and printed_reasoning_content:
            answer.extend(content)
        print(content, end="", flush=True)
if answer:
    print("\n\n=============\nAnswer\n=============\n")
    print("".join(answer))
else:
    print("\n\n=============\nNo Answer\n=============\n")
    print(
        "No answer was generated by the model, probably because the maximum number of tokens was reached."
    )

Transformers

You can also use Ministral 3 3B Reasoning 2512 with Transformers !
Make sure to install Transformers from its first v5 release candidate or from "main":

pip install transformers==5.0.0rc0

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

Python snippet

import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
model_id = "mistralai/Ministral-3-3B-Reasoning-2512"
tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]
tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
image_sizes = [tokenized["pixel_values"].shape[-2:]]
output = model.generate(
    **tokenized,
    image_sizes=image_sizes,
    max_new_tokens=8092,
)[0]
decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Features & Capabilities

Mode	chat
Context Window	131,072 tokens
Max Output	131,072 tokens
Function Calling	Supported
Vision	Supported
Reasoning	Not supported
Web Search	Not supported
Url Context	Not supported

Technical Details

Architecture	Mistral3ForConditionalGeneration
Model Type	mistral3
Base Model	mistralai/Ministral-3-3B-Base-2512
Languages	en, fr, es, de, it, pt, nl, zh, ja, ko, ar
Library	vllm

API Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.haimaker.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="mistralai/ministral-3b-2512",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
)

print(response.choices[0].message.content)

Frequently Asked Questions

What is the context window of Ministral 3 3B Reasoning 2512?

Ministral 3 3B Reasoning 2512 (mistralai/ministral-3b-2512) has a 131,072-token context window and supports up to 131,072 output tokens per request.

How much does Ministral 3 3B Reasoning 2512 cost?

Ministral 3 3B Reasoning 2512 is priced at $0.10 per 1M input tokens and $0.10 per 1M output tokens when accessed via the haimaker.ai OpenAI-compatible API.

What features does Ministral 3 3B Reasoning 2512 support?

Ministral 3 3B Reasoning 2512 supports function calling, vision.

How do I use Ministral 3 3B Reasoning 2512 via API?

Send requests to https://api.haimaker.ai/v1/chat/completions with model "mistralai/ministral-3b-2512" using any OpenAI-compatible SDK. Authentication uses a Bearer API key from https://app.haimaker.ai.