NVIDIA H200 NVL (2x) - llama-2-70b-hf

November 6, 2025 at 03:39 AM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
5,012.77
Peak generation speed
Best Input TPS
10,466.05
Peak prefill speed
Best Energy Efficiency
0.03 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
51.52 ms
Lowest latency
Best E2E (P95)
2,070.89 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
128512512x5,012.771,367.050.051,207.762,328.8341,556.05100.0%
1281281x41.2939.352.03254.43254.433,099.30100.0%
1281282x84.4185.091.2265.7977.082,926.04100.0%
1281284x169.93168.600.56105.32118.493,006.52100.0%
1281288x329.47330.440.30165.28187.683,095.06100.0%
12812816x621.17625.720.17238.59282.773,269.81100.0%
12812832x1,091.211,095.070.12328.89498.573,545.8796.9%
12812864x1,508.751,517.870.071,804.111,964.825,295.49100.0%
128128128x1,826.681,906.530.062,185.554,272.608,152.60100.0%
128128256x3,318.263,473.660.04678.161,809.857,273.4899.2%
128128512x3,268.083,625.440.042,074.724,791.5513,227.9399.8%
1285121x30.8928.122.511,341.121,341.124,314.42100.0%
1285122x48.3521.623.2967.0380.8311,032.70100.0%
1285124x175.1943.461.0181.7189.4611,683.49100.0%
1285128x344.9086.480.52109.71123.8011,868.61100.0%
12851216x613.53168.310.30133.93148.7712,218.42100.0%
12851232x1,226.65328.870.17158.39181.3712,388.53100.0%
12851264x2,000.29589.840.11150.04202.6613,506.85100.0%
128512128x3,478.11953.380.07150.51219.9015,339.96100.0%
128512256x4,304.941,190.980.052,206.103,406.9426,662.42100.0%
1285121024x4,188.171,170.550.0514,591.3549,854.8481,431.1699.8%
1281,0241x44.297.234.3781.1681.1616,843.00100.0%
1281,0242x88.0110.742.3569.8680.9423,262.81100.0%
1281,0244x175.6521.781.1799.87112.2723,310.77100.0%
1281,0248x187.7538.251.03147.92166.1723,433.2787.5%
1282,0481x44.552.654.9852.8452.8445,968.56100.0%
1282,0482x47.926.844.2670.6080.2534,855.00100.0%
1282,0484x141.4610.871.5582.2590.0746,727.93100.0%
1282,0488x192.7321.831.11111.96126.9447,039.46100.0%
1282,04816x517.3641.530.45207.18256.9449,628.20100.0%
1282,04832x831.1176.420.281,872.691,930.9353,633.40100.0%
1282,04864x1,804.46135.560.153,166.193,296.3360,022.86100.0%
1282,048128x3,153.17238.800.10434.661,015.3567,508.8899.2%
1282,048256x4,168.83324.510.07695.731,796.8298,546.7799.6%
1282,048512x3,275.70253.250.091,030.682,911.88243,219.4597.3%
5121281x27.29105.731.141,841.031,841.034,691.30100.0%
5121282x84.54328.600.44125.29137.813,024.67100.0%
5121284x130.69638.270.27169.69205.973,090.62100.0%
5121288x309.741,194.800.15324.17365.703,296.80100.0%
51212816x513.781,989.830.10490.93686.563,713.1693.8%
51212832x871.463,618.450.05601.481,255.484,322.93100.0%
51212864x1,371.965,337.460.04992.632,248.075,820.83100.0%
512128128x1,279.565,285.290.041,552.743,942.1611,707.64100.0%
5125121x44.5343.142.6051.5251.5211,496.49100.0%
5125122x57.6485.721.5460.6262.8011,201.49100.0%
5125124x150.64168.920.7087.1197.6111,705.67100.0%
5125128x281.82331.210.37111.93123.6311,916.28100.0%
51251216x564.11640.830.20139.41154.4712,336.35100.0%
51251232x1,207.081,250.570.11195.14216.1412,669.04100.0%
51251264x2,027.672,203.330.07183.36220.9514,000.3998.4%
512512128x3,280.113,616.820.04167.74236.5416,387.51100.0%
512512256x3,363.613,595.790.052,556.108,444.3833,661.6099.6%
512512512x3,525.393,738.730.044,743.9816,061.8161,761.9499.8%
5125121024x3,064.303,280.210.0544,466.42103,102.34150,797.0499.7%
5121,0241x43.6061.262.07116.65116.658,072.31100.0%
5121,0242x87.3942.461.76124.19125.5423,432.59100.0%
5121,0244x139.6183.981.05171.44205.2123,541.60100.0%
5121,0248x299.35163.750.52327.10369.4524,112.35100.0%
5121,02416x594.20313.060.27438.19643.2625,317.18100.0%
5121,02432x1,159.26598.820.15585.891,233.7526,559.08100.0%
5121,02464x1,762.741,050.770.11928.062,268.2730,145.46100.0%
5121,024128x2,915.041,644.470.071,440.714,228.8538,012.3299.2%
5121,024256x3,554.222,001.520.062,695.9610,761.6561,577.7899.6%
5121,024512x2,870.081,615.530.075,023.9019,014.26147,933.9199.4%
5122,0481x44.2810.734.21118.39118.3946,245.10100.0%
5122,0482x87.4021.232.16106.60140.0446,856.96100.0%
5122,0484x132.1142.091.35180.40216.7746,991.79100.0%
5122,0488x275.0082.160.67300.69371.7848,057.92100.0%
5122,04816x511.24156.740.37419.66612.9050,588.87100.0%
5122,04832x1,041.27298.190.21586.131,248.9653,440.16100.0%
5122,04864x1,750.94503.810.131,337.953,652.7562,089.3098.4%
5122,048128x2,768.79870.620.091,514.774,338.4172,688.35100.0%
5122,048256x3,396.651,058.940.072,557.098,492.04114,589.12100.0%
5122,048512x2,699.66833.680.097,431.0716,795.97254,305.7285.0%
1,0241281x40.59465.620.40185.54185.542,070.89100.0%
1,0241282x43.19644.250.30183.03184.092,888.85100.0%
1,0241284x121.691,204.250.16280.14350.323,237.91100.0%
1,0241288x283.662,169.810.10571.54648.563,599.26100.0%
1,02412816x471.953,641.860.06713.231,196.154,285.64100.0%
1,02412832x712.915,603.960.04946.232,329.345,493.34100.0%
1,02412864x922.097,386.540.041,697.684,372.098,209.1198.4%
1,024128128x1,132.209,154.830.032,622.748,712.8413,295.58100.0%
1,024128256x1,208.829,723.980.034,885.6716,515.4424,465.96100.0%
1,024128512x1,214.929,762.920.039,628.2934,026.0545,020.5199.4%
1,0241281024x621.784,996.260.0558,609.88115,659.72131,302.3566.8%
1,0245121x43.9283.631.75181.28181.2811,657.09100.0%
1,0245122x86.71165.470.88182.69184.1911,806.44100.0%
1,0245124x169.79324.080.46309.53378.6612,055.14100.0%
1,0245128x299.10624.590.26571.53647.2812,529.25100.0%
1,02451216x504.961,145.780.15678.951,213.2613,663.08100.0%
1,02451232x967.911,989.280.091,067.442,311.6415,204.9196.9%
1,02451264x1,509.793,267.910.061,664.974,365.6619,059.42100.0%
1,024512128x2,194.364,682.170.052,612.008,695.8426,484.00100.0%
1,024512256x2,513.975,356.170.044,895.4816,487.4745,216.7699.6%
1,024512512x1,949.914,140.010.0521,743.8887,079.55114,638.4299.8%
1,0241,0241x38.1336.312.843,857.143,857.1426,853.44100.0%
1,0241,0242x87.2783.261.37190.83192.3423,465.19100.0%
1,0241,0244x172.40164.530.70313.04354.1823,753.93100.0%
1,0241,0248x333.16318.560.37436.93644.2624,570.14100.0%
1,0241,02416x552.28599.930.22706.811,202.2626,156.05100.0%
1,0241,02432x1,046.511,103.900.13934.262,314.3228,354.53100.0%
1,0241,02464x1,585.551,839.320.091,581.134,475.5633,445.4698.4%
1,0241,024128x2,498.182,740.970.062,605.558,696.3745,431.76100.0%
1,0241,024256x2,933.973,246.140.055,693.6118,573.7275,637.54100.0%
1,0241,024512x2,075.022,305.380.0714,184.5835,204.64175,177.0582.4%
1,0242,0481x44.1421.013.59183.27183.2746,393.76100.0%
1,0242,0482x50.4142.162.52184.48185.6544,357.10100.0%
1,0242,0484x87.9083.081.38288.10358.7347,036.14100.0%
1,0242,0488x309.52160.410.52574.18653.4048,810.78100.0%
1,0242,04816x622.96301.310.27672.581,211.8552,095.17100.0%
1,0242,04832x986.70565.490.181,166.192,302.8255,490.08100.0%
1,0242,04864x1,500.00984.140.121,594.204,389.6963,721.77100.0%
1,0242,048128x2,620.501,434.560.082,607.768,565.2086,550.4899.2%
1,0242,048256x2,422.841,469.720.084,817.8416,411.85163,231.9999.6%
2,0481281x39.78611.560.28335.50335.503,217.66100.0%
2,0481282x40.921,214.820.17330.33342.873,083.07100.0%
2,0481284x143.062,190.280.10488.39624.713,573.84100.0%
2,0481288x241.003,683.690.06847.661,101.914,222.63100.0%
2,04812816x351.325,589.890.051,029.392,328.925,533.32100.0%
2,04812832x493.277,723.980.041,640.984,526.957,973.35100.0%
2,04812864x577.689,411.260.032,765.578,837.4912,993.79100.0%
2,048128128x656.1610,466.050.034,929.0717,061.3323,076.9199.2%
2,048128256x643.0810,236.660.0310,713.2639,646.0646,844.64100.0%
2,0485121x43.09165.611.07322.97322.9711,882.74100.0%
2,0485122x52.52327.210.60205.74328.5611,522.45100.0%
2,0485124x124.62635.610.31475.56610.9712,323.84100.0%
2,0485128x295.471,177.370.171,049.691,197.5013,284.38100.0%
2,04851216x503.492,059.320.101,116.102,348.5615,126.64100.0%
2,04851232x830.143,408.800.071,705.444,520.9118,225.60100.0%
2,04851264x1,126.574,965.720.053,611.949,189.3224,995.91100.0%
2,048512128x1,434.636,088.290.045,967.1519,243.2840,500.18100.0%
2,048512256x1,183.325,026.690.0518,927.6079,123.0197,646.22100.0%
2,0481,0241x43.6183.821.79335.00335.0023,477.73100.0%
2,0481,0242x86.25165.040.93214.68337.9523,740.06100.0%
2,0481,0244x114.35325.490.53493.22628.9424,073.12100.0%
2,0481,0248x259.71617.580.27827.821,084.9125,335.86100.0%
2,0481,02416x548.291,116.540.151,075.832,340.8627,961.44100.0%
2,0481,02432x880.901,951.960.101,692.774,496.8631,901.84100.0%
2,0481,02464x1,290.472,727.970.076,285.7911,850.7945,608.21100.0%
2,0481,024128x1,901.654,150.280.054,995.7716,567.4959,580.03100.0%
2,0481,024256x1,296.592,832.230.078,531.8728,148.90141,502.4884.8%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H200 NVL
GPU Count2
GPU Memory (Total)280 GB
GPU Driver580.95.05
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)600 W
CPU ModelIntel(R) Xeon(R) 6960P
RAM2,267 GB

Software Configuration

Inference FrameworkvLLM
Framework Version0.11.0
OSUbuntu
OS Version22.04.5 LTS (Jammy Jellyfish)
Kernel Version5.15.0-88-generic
Python Version3.10.12

Model Configuration

Providermeta-llama
Model Namellama-2-70b-hf
QuantizationFP16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length4096
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization0.95
Temperature0.70
Top-P1.00
Top-K-1