Inference Benchmarks

Performance metrics across hardware, software, and model configurations

Early Preview

We are working to certify more internal benchmarks to be published. If you're interested in providing hardware or have questions, email benchmarks@haimaker.ai.

Filters

Active Filters:Tag: llama-2-70b-hf

Found 3 benchmark suites

Date	Suite Name	GPU	Model	Output TPS	Input TPS	Energy Cost (kWh/MT)
11/13/2025	NVIDIA H100 80GB HBM3 (8x) - llama-2-70b-hf	NVIDIA H100 80GB HBM3 8x 632GB	llama-2-70b-hf meta-llama	668.64	855.76	0.79
11/6/2025	NVIDIA H200 NVL (2x) - llama-2-70b-hf (50% Max Batch Token)	NVIDIA H200 NVL 2x 280GB	llama-2-70b-hf meta-llama	4,620.81	8,844.22	0.03
11/6/2025	NVIDIA H200 NVL (2x) - llama-2-70b-hf	NVIDIA H200 NVL 2x 280GB	llama-2-70b-hf meta-llama	5,012.77	10,466.05	0.03