Mistral Large 3 675B Instruct 2512

Name: Mistral Large 3 675B Instruct 2512
Brand: Mistral AI
SKU: mistralai/mistral-large-2512
Price: 0.5000 USD
Availability: InStock

mistralai/mistral-large-2512

Chatapache-2.0

Mistral AI|

Function CallingVision

|Released Nov 2025 · Updated Jul 2026

Mistral Large 3 675B Instruct 2512 (mistralai/mistral-large-2512) is an AI model from Mistral AI with a 262,144-token context window and 262,144 max output tokens, priced at $0.50/1M input and $1.50/1M output tokens. Available via the haimaker.ai OpenAI-compatible API.

Context Window

262K

tokens

Max Output

262K

tokens

Input Price

$0.50

/1M tokens

Output Price

$1.50

/1M tokens

Overview

From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

Model Card

Mistral Large 3 675B Instruct 2512

From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

This model is the instruct post-trained version in FP8, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.

Learn more in our blog post here.

Mistral Large 3 is deployable on-premises in:

FP8 on a single node of B200s or H200s.

NVFP4 on a single node of H100s or A100s.

We provide a BF16 version if needed.

Key Features

Mistral Large 3 consists of two main architectural components:

A Granular MoE Language Model with 673B params and 39B active
A 2.5B Vision Encoder

The Mistral Large 3 Instruct model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
System Prompt: Maintains strong adherence and support for system prompts.
Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
Frontier: Delivers best-in-class performance.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

With powerful long-context performance, stable and consistent cross-domain behavior, Mistral Large 3 is perfect for:

Long Document Understanding
Powerful Daily-Driver AI Assistants
State-of-the-Art Agentic and Tool-Use Capabilities
Enterprise Knowledge Work
General Coding Assistant

And enterprise-grade use cases requiring frontier capabilities.

Recommended Settings

We recommend deploying Large 3 in a client-server configuration with the following best practices:

System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.

Known Issues / Limitations

Not a dedicated reasoning model: Dedicated reasoning models can outperform Mistral Large 3 in strict reasoning use cases.
Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.
Complex deployment: Due to its large size and architecture, the model can be challenging to deploy efficiently with constrained resources or at scale.

Benchmark Results

We compare Mistral Large 3 to similar sized models.

Usage

The model can be used with the following frameworks;

vllm: See here

[!Note]
We sadly didn't have enough time to add Mistral Large 3 to transformers, but we would be very happy for a community contribution by opening a PR to huggingface/transformers.

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 1.12.0:

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.

Simple

A simple launch command is:

vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
  --max-model-len 262144 --tensor-parallel-size 8 \
  --tokenizer_mode mistral --config_format mistral --load_format mistral \
  --enable-auto-tool-choice --tool-call-parser mistral

Key parameter notes:

enable-auto-tool-choice: Required when enabling tool usage.
tool-call-parser mistral: Required when enabling tool usage.

Additional flags:

You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Accelerated with speculative decoding

For maximum performance we recommend serving the checkpoint with its customized draft model Mistral-Large-3-675B-Instruct-2512-Eagle:

vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
  --tensor-parallel-size 8 \
  --load-format mistral \
  --tokenizer-mode mistral \
  --config-format mistral \
  --enable-auto-tool-choice \
  --tool-call-parser mistral \
  --limit-mm-per-prompt '{"image": 10}' \
  --speculative_config '{
    "model": "mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle",
    "num_speculative_tokens": 3,
    "method": "eagle",
    "max_model_len": "16384"
  }'

For more information on the draft model, please have a look at Mistral-Large-3-675B-Instruct-2512-Eagle.

Usage of the model

Here we asumme that the model mistralai/Mistral-Large-3-675B-Instruct-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Vision Reasoning

Let's see if Mistral Large 3 knows when to pick a fight !

from datetime import datetime, timedelta
from openai import OpenAI
from huggingface_hub import hf_hub_download
Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.15
MAX_TOK = 262144
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]
response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
)
print(response.choices[0].message.content)

Function Calling

Let's solve some equations thanks to our simple Python calculator tool.

import json
from openai import OpenAI
from huggingface_hub import hf_hub_download
Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.15
MAX_TOK = 262144
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"
def my_calculator(expression: str) -> str:
    return str(eval(expression))
tools = [
    {
        "type": "function",
        "function": {
            "name": "my_calculator",
            "description": "A calculator that can evaluate a mathematical equation and compute its results.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate.",
                    },
                },
                "required": ["expression"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite",
                    }
                },
            },
        },
    },
]
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                },
            },
        ],
    },
]
response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
    tools=tools,
    tool_choice="auto",
)
tool_calls = response.choices[0].message.tool_calls
results = []
for tool_call in tool_calls:
    function_name = tool_call.function.name
    function_args = tool_call.function.arguments
    if function_name == "my_calculator":
        result = my_calculator(**json.loads(function_args))
        results.append(result)
messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
    messages.append(
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": tool_call.function.name,
            "content": result,
        }
    )
response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
)
print(response.choices[0].message.content)

Text-Only Request

Mistral Large 3 can follow your instructions down to the letter.

from openai import OpenAI
from huggingface_hub import hf_hub_download
Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
TEMP = 0.15
MAX_TOK = 262144
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt
SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
    },
]
response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
)
assistant_message = response.choices[0].message.content
print(assistant_message)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Features & Capabilities

Mode	chat
Context Window	262,144 tokens
Max Output	262,144 tokens
Function Calling	Supported
Vision	Supported
Reasoning	Not supported
Web Search	Not supported
Url Context	Not supported

Technical Details

Base Model	mistralai/Mistral-Large-3-675B-Base-2512
Languages	en, fr, es, de, it, pt, nl, zh, ja, ko, ar
Library	vllm

API Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.haimaker.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="mistralai/mistral-large-2512",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
)

print(response.choices[0].message.content)

Frequently Asked Questions

What is the context window of Mistral Large 3 675B Instruct 2512?

Mistral Large 3 675B Instruct 2512 (mistralai/mistral-large-2512) has a 262,144-token context window and supports up to 262,144 output tokens per request.

How much does Mistral Large 3 675B Instruct 2512 cost?

Mistral Large 3 675B Instruct 2512 is priced at $0.50 per 1M input tokens and $1.50 per 1M output tokens when accessed via the haimaker.ai OpenAI-compatible API.

What features does Mistral Large 3 675B Instruct 2512 support?

Mistral Large 3 675B Instruct 2512 supports function calling, vision.

How do I use Mistral Large 3 675B Instruct 2512 via API?

Send requests to https://api.haimaker.ai/v1/chat/completions with model "mistralai/mistral-large-2512" using any OpenAI-compatible SDK. Authentication uses a Bearer API key from https://app.haimaker.ai.