Haimaker.ai Logo

Inference Benchmarks

Performance metrics across hardware, software, and model configurations

Back to Home

Early Preview

We are working to certify more internal benchmarks to be published. If you're interested in providing hardware or have questions, email [email protected].

Total Benchmarks

22

GPU Models Tested

1

Frameworks

1

Filters

Clear

Found 22 benchmark suites

DateSuite NameGPUModelOutput TPSInput TPSEnergy Cost
(kWh/MT)
12/19/2025
NVIDIA A100-SXM4-80GB (2x) - gpt-oss-120b Reasoning
NVIDIA A100-SXM4-80GB
2x 160GB
gpt-oss-120b
openai
3,860.3915,892.900.01
11/20/2025
NVIDIA A100-PCIE-40GB (1x) - Mistral-Nemo-Instruct
NVIDIA A100-PCIE-40GB
1x 40GB
Mistral-Nemo-Instruct
mistral
3,541.626,567.890.01
11/13/2025
NVIDIA H100 80GB HBM3 (8x) - gpt-oss-120b
NVIDIA H100 80GB HBM3
8x 632GB
gpt-oss-120b
openai
18,672.4750,200.550.02
11/13/2025
NVIDIA H100 80GB HBM3 (8x) - llama-2-70b-hf
NVIDIA H100 80GB HBM3
8x 632GB
llama-2-70b-hf
meta-llama
668.64855.760.79
11/12/2025
NVIDIA H100 80GB HBM3 (8x) - llama-3.3-70b-instruct
NVIDIA H100 80GB HBM3
8x 632GB
llama-3.3-70b-instruct
meta-llama
9,219.6016,108.820.06
11/7/2025
NVIDIA H200 NVL (2x) - mistral-nemo-instruct-2407
NVIDIA H200 NVL
2x 280GB
mistral-nemo-instruct-2407
mistralai
12,204.4847,690.470.01
11/7/2025
NVIDIA H200 NVL (2x) - qwen3-30b-a3b
NVIDIA H200 NVL
2x 280GB
qwen3-30b-a3b
qwen
6,124.3851,413.770.00
11/6/2025
NVIDIA H200 NVL (2x) - allam-7b-instruct-preview
NVIDIA H200 NVL
2x 280GB
allam-7b-instruct-preview
humain-ai
11,481.6445,184.120.01
11/6/2025
NVIDIA H200 NVL (2x) - llama-2-70b-hf (50% Max Batch Token)
NVIDIA H200 NVL
2x 280GB
llama-2-70b-hf
meta-llama
4,620.818,844.220.03
11/6/2025
NVIDIA H200 NVL (2x) - llama-2-70b-hf
NVIDIA H200 NVL
2x 280GB
llama-2-70b-hf
meta-llama
5,012.7710,466.050.03
11/5/2025
NVIDIA H200 NVL (2x) - gpt-oss-120b
NVIDIA H200 NVL
2x 280GB
gpt-oss-120b
openai
3,166.0611,929.370.01
11/5/2025
NVIDIA H200 NVL (2x) - qwen3-coder-30b-a3b-instruct
NVIDIA H200 NVL
2x 280GB
qwen3-coder-30b-a3b-instruct
qwen
5,757.7643,900.390.01
11/5/2025
NVIDIA H200 NVL (2x) - llama-3.3-70b-instruct
NVIDIA H200 NVL
2x 280GB
llama-3.3-70b-instruct
meta-llama
5,005.2911,042.390.03
11/2/2025
NVIDIA A100 80GB PCIe (2x) - gpt-oss-120b
NVIDIA A100 80GB PCIe
2x 160GB
gpt-oss-120b
openai
1,673.995,556.370.02
11/2/2025
NVIDIA A100 80GB PCIe (2x) - gemma-3-27b-it
NVIDIA A100 80GB PCIe
2x 160GB
gemma-3-27b-it
google
1,834.304,909.530.03
10/26/2025
NVIDIA H20 (8x) - deepseek-v3.1
NVIDIA H20
8x 760GB
deepseek-v3.1
deepseek-ai
865.084,142.630.16
10/25/2025
NVIDIA H20 (8x) - llama-3.3-70b-instruct (High Throughput)
NVIDIA H20
8x 760GB
llama-3.3-70b-instruct
meta-llama
5,091.047,327.230.10
10/24/2025
NVIDIA H20 (8x) - llama-3.3-70b-instruct
NVIDIA H20
8x 760GB
llama-3.3-70b-instruct
meta-llama
3,370.986,350.240.11
10/24/2025
NVIDIA H20 (8x) - qwen2.5-vl-72b-instruct
NVIDIA H20
8x 760GB
qwen2.5-vl-72b-instruct
qwen
2,266.046,375.820.11
10/24/2025
NVIDIA H20 (8x) - qwen3-coder-30b-a3b-instruct
NVIDIA H20
8x 760GB
qwen3-coder-30b-a3b-instruct
qwen
8,987.3830,699.040.02
10/24/2025
NVIDIA H20 (8x) - mistral-nemo-instruct-2407
NVIDIA H20
8x 760GB
mistral-nemo-instruct-2407
mistralai
12,605.4324,969.580.02
10/24/2025
NVIDIA H20 (8x) - gemma-3-27b-it
NVIDIA H20
8x 760GB
gemma-3-27b-it
google
6,567.8012,358.950.05