Inference Benchmarks

Performance metrics across hardware, software, and model configurations

Early Preview

We are working to certify more internal benchmarks to be published. If you're interested in providing hardware or have questions, email benchmarks@haimaker.ai.

Filters

Found 25 benchmark suites

Date	Suite Name	GPU	Model	Output TPS	Input TPS	Energy Cost (kWh/MT)
3/9/2026	NVIDIA A100-SXM4-80GB (1x) - qwen3-8b	NVIDIA A100-SXM4-80GB 1x 80GB	qwen3-8b qwen	5,547.95	42,519.57	0.00
2/19/2026	Tenstorrent Wormhole (32x) - Llama-3.3-70B-Instruct	Wormhole 32x 384GB	Llama-3.3-70B-Instruct meta-llama	1,091.26	4,654.03	0.13
2/19/2026	Tenstorrent Wormhole (32x) - Qwen3-32B	Wormhole 32x 384GB	Qwen3-32B qwen	1,438.10	5,640.69	0.11
12/19/2025	NVIDIA A100-SXM4-80GB (2x) - gpt-oss-120b Reasoning	NVIDIA A100-SXM4-80GB 2x 160GB	gpt-oss-120b openai	3,860.39	15,892.90	0.01
11/20/2025	NVIDIA A100-PCIE-40GB (1x) - Mistral-Nemo-Instruct	NVIDIA A100-PCIE-40GB 1x 40GB	Mistral-Nemo-Instruct mistral	3,541.62	6,567.89	0.01
11/13/2025	NVIDIA H100 80GB HBM3 (8x) - gpt-oss-120b	NVIDIA H100 80GB HBM3 8x 632GB	gpt-oss-120b openai	18,672.47	50,200.55	0.02
11/13/2025	NVIDIA H100 80GB HBM3 (8x) - llama-2-70b-hf	NVIDIA H100 80GB HBM3 8x 632GB	llama-2-70b-hf meta-llama	668.64	855.76	0.79
11/12/2025	NVIDIA H100 80GB HBM3 (8x) - llama-3.3-70b-instruct	NVIDIA H100 80GB HBM3 8x 632GB	llama-3.3-70b-instruct meta-llama	9,219.60	16,108.82	0.06
11/7/2025	NVIDIA H200 NVL (2x) - mistral-nemo-instruct-2407	NVIDIA H200 NVL 2x 280GB	mistral-nemo-instruct-2407 mistralai	12,204.48	47,690.47	0.01
11/7/2025	NVIDIA H200 NVL (2x) - qwen3-30b-a3b	NVIDIA H200 NVL 2x 280GB	qwen3-30b-a3b qwen	6,124.38	51,413.77	0.00
11/6/2025	NVIDIA H200 NVL (2x) - allam-7b-instruct-preview	NVIDIA H200 NVL 2x 280GB	allam-7b-instruct-preview humain-ai	11,481.64	45,184.12	0.01
11/6/2025	NVIDIA H200 NVL (2x) - llama-2-70b-hf (50% Max Batch Token)	NVIDIA H200 NVL 2x 280GB	llama-2-70b-hf meta-llama	4,620.81	8,844.22	0.03
11/6/2025	NVIDIA H200 NVL (2x) - llama-2-70b-hf	NVIDIA H200 NVL 2x 280GB	llama-2-70b-hf meta-llama	5,012.77	10,466.05	0.03
11/5/2025	NVIDIA H200 NVL (2x) - gpt-oss-120b	NVIDIA H200 NVL 2x 280GB	gpt-oss-120b openai	3,166.06	11,929.37	0.01
11/5/2025	NVIDIA H200 NVL (2x) - qwen3-coder-30b-a3b-instruct	NVIDIA H200 NVL 2x 280GB	qwen3-coder-30b-a3b-instruct qwen	5,757.76	43,900.39	0.01
11/5/2025	NVIDIA H200 NVL (2x) - llama-3.3-70b-instruct	NVIDIA H200 NVL 2x 280GB	llama-3.3-70b-instruct meta-llama	5,005.29	11,042.39	0.03
11/2/2025	NVIDIA A100 80GB PCIe (2x) - gpt-oss-120b	NVIDIA A100 80GB PCIe 2x 160GB	gpt-oss-120b openai	1,673.99	5,556.37	0.02
11/2/2025	NVIDIA A100 80GB PCIe (2x) - gemma-3-27b-it	NVIDIA A100 80GB PCIe 2x 160GB	gemma-3-27b-it google	1,834.30	4,909.53	0.03
10/26/2025	NVIDIA H20 (8x) - deepseek-v3.1	NVIDIA H20 8x 760GB	deepseek-v3.1 deepseek-ai	865.08	4,142.63	0.16
10/25/2025	NVIDIA H20 (8x) - llama-3.3-70b-instruct (High Throughput)	NVIDIA H20 8x 760GB	llama-3.3-70b-instruct meta-llama	5,091.04	7,327.23	0.10
10/24/2025	NVIDIA H20 (8x) - llama-3.3-70b-instruct	NVIDIA H20 8x 760GB	llama-3.3-70b-instruct meta-llama	3,370.98	6,350.24	0.11
10/24/2025	NVIDIA H20 (8x) - qwen2.5-vl-72b-instruct	NVIDIA H20 8x 760GB	qwen2.5-vl-72b-instruct qwen	2,266.04	6,375.82	0.11
10/24/2025	NVIDIA H20 (8x) - qwen3-coder-30b-a3b-instruct	NVIDIA H20 8x 760GB	qwen3-coder-30b-a3b-instruct qwen	8,987.38	30,699.04	0.02
10/24/2025	NVIDIA H20 (8x) - mistral-nemo-instruct-2407	NVIDIA H20 8x 760GB	mistral-nemo-instruct-2407 mistralai	12,605.43	24,969.58	0.02
10/24/2025	NVIDIA H20 (8x) - gemma-3-27b-it	NVIDIA H20 8x 760GB	gemma-3-27b-it google	6,567.80	12,358.95	0.05