NVIDIA H20 (8x) - qwen2.5-vl-72b-instruct

October 24, 2025 at 08:18 PM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
2,266.04
Peak generation speed
Best Input TPS
6,375.82
Peak prefill speed
Best Energy Efficiency
0.11 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
61.20 ms
Lowest latency
Best E2E (P95)
3,817.82 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
1281,024512x2,266.04442.250.3965,598.25108,701.35130,455.20100.0%
1281281x23.5522.457.791,394.751,394.755,434.37100.0%
1281282x46.3945.315.211,473.811,476.475,516.64100.0%
1281284x91.5490.833.55581.181,354.915,323.46100.0%
1281288x253.40254.151.74422.37499.524,036.88100.0%
12812816x470.14473.820.89700.54773.794,345.82100.0%
12812832x763.61764.730.571,333.191,644.325,351.03100.0%
12812864x1,221.121,223.220.312,415.122,784.896,645.46100.0%
128128128x1,346.611,348.180.324,019.235,075.179,280.31100.0%
128128256x1,043.381,048.720.4117,868.0522,963.8027,612.93100.0%
1285121x35.8011.239.97105.39105.3910,839.41100.0%
1285122x72.0617.597.2294.9596.0014,207.42100.0%
1285124x141.2835.043.66182.48223.6214,493.97100.0%
1285128x273.9670.412.97324.29373.8714,571.46100.0%
12851216x531.91137.851.53446.76471.4614,955.32100.0%
12851232x990.93256.500.82724.22843.8815,967.51100.0%
12851264x1,647.98433.720.481,391.901,838.6918,746.33100.0%
128512128x1,864.27481.210.394,110.566,829.6323,913.73100.0%
128512256x1,938.79502.250.3824,996.4736,091.9553,612.15100.0%
128512512x2,107.20549.760.3757,241.9787,590.69105,477.04100.0%
1285121024x1,679.03437.470.4773,670.06101,952.74118,963.9246.8%
1281,0241x35.917.3410.96107.89107.8916,597.63100.0%
1281,0242x59.5913.968.16144.77149.4017,574.86100.0%
1281,0244x121.3126.536.29167.82195.9419,073.40100.0%
1281,0248x209.2749.213.65289.65358.5120,644.90100.0%
1281,02416x391.1279.642.09470.67619.8323,673.65100.0%
1281,02432x787.77161.131.05901.14987.9924,236.95100.0%
1281,02464x1,268.99248.480.651,515.412,024.0928,917.27100.0%
1281,024128x1,860.38364.800.434,656.405,175.3437,685.76100.0%
1281,024256x1,927.46379.560.4329,113.3748,436.1372,912.60100.0%
1281,0241024x1,674.36323.990.5270,065.30102,609.86125,192.4937.1%
1282,0481x35.8510.8510.1161.2061.2011,210.75100.0%
1282,0482x64.8812.848.17143.09144.2519,249.71100.0%
1282,0484x101.3621.545.60211.69291.0622,594.29100.0%
1282,0488x229.4046.052.72381.45498.8921,942.49100.0%
1282,04816x414.6186.141.83548.56673.5921,773.51100.0%
1282,04832x757.80149.651.11892.341,050.2026,294.02100.0%
1282,04864x1,469.26289.110.601,517.382,066.1627,638.12100.0%
1282,048128x1,981.47385.670.393,843.574,631.7933,194.32100.0%
1282,048256x2,111.57425.480.3927,475.8144,942.4269,642.57100.0%
1282,048512x2,156.45426.200.3864,945.08104,600.90128,703.76100.0%
1282,0481024x1,804.24352.950.4683,996.13124,691.44147,958.1249.4%
5121281x33.52129.882.66325.88325.883,817.82100.0%
5121282x66.61258.911.77346.27347.973,841.78100.0%
5121284x108.13417.740.82980.141,179.294,732.92100.0%
5121288x238.36919.460.61648.69757.384,292.80100.0%
51212816x380.531,474.540.481,174.761,769.655,372.53100.0%
51212832x610.982,380.820.282,291.472,941.726,692.54100.0%
51212864x892.473,472.060.193,561.335,191.749,143.96100.0%
512128128x935.593,641.440.166,737.2811,233.5315,423.70100.0%
512128256x746.862,909.300.2024,075.9733,508.7039,690.23100.0%
5125121x36.1335.006.5063.2163.2114,168.70100.0%
5125122x71.5569.533.27102.91116.1814,308.74100.0%
5125124x142.55137.682.32160.86175.3714,360.63100.0%
5125128x267.42257.881.55535.39775.0115,312.74100.0%
51251216x504.61511.370.98582.72932.1715,507.95100.0%
51251232x953.97975.730.52837.56973.9816,336.90100.0%
51251264x1,584.121,580.700.312,175.833,302.4120,102.32100.0%
512512128x1,615.581,625.290.255,350.026,803.3625,013.10100.0%
512512256x1,619.221,640.970.2728,392.8744,822.9163,618.81100.0%
512512512x1,895.101,899.750.2657,167.8799,635.04118,776.31100.0%
5125121024x1,192.051,195.840.4166,228.8295,015.90112,203.6832.3%
5121,0241x35.6324.087.99314.44314.4420,571.93100.0%
5121,0242x69.4245.505.69344.61344.7121,784.15100.0%
5121,0244x125.0785.813.85294.77459.0222,773.40100.0%
5121,0248x237.98169.802.47533.60748.8823,121.28100.0%
5121,02416x390.91301.661.42914.451,626.2725,025.58100.0%
5121,02432x746.45549.470.771,878.782,713.7828,166.48100.0%
5121,02464x1,357.60991.290.432,679.894,378.4130,091.22100.0%
5121,024128x1,707.081,206.410.327,160.7714,645.6141,868.15100.0%
5121,024256x2,000.361,406.700.3032,284.6656,218.7183,320.77100.0%
5121,024512x1,989.541,396.310.2979,514.32137,213.71163,338.10100.0%
5122,0481x35.7225.357.72309.97309.9719,537.77100.0%
5122,0482x52.8741.014.95483.00607.5123,650.23100.0%
5122,0484x134.0894.483.67353.53431.6720,784.64100.0%
5122,0488x247.68173.652.22649.681,020.5222,519.09100.0%
5122,04816x440.40303.311.331,113.991,689.7125,985.11100.0%
5122,04832x768.83565.470.761,938.182,388.7327,293.10100.0%
5122,04864x1,271.40896.060.463,639.785,861.9633,486.63100.0%
5122,048128x1,579.021,115.320.327,399.619,653.6742,614.06100.0%
5122,048256x1,831.281,284.120.3032,746.0656,328.5784,825.42100.0%
5122,048512x2,074.361,457.320.2873,991.02130,375.09158,316.68100.0%
5122,0481024x1,861.041,297.800.31101,216.77169,847.02196,768.1156.7%
1,0241281x31.35238.801.66574.93574.934,082.92100.0%
1,0241282x54.75417.880.92877.101,110.374,673.78100.0%
1,0241284x108.52828.530.73785.841,106.694,712.10100.0%
1,0241288x172.421,318.910.431,507.912,298.195,934.35100.0%
1,02412816x310.872,384.790.302,093.762,936.756,581.54100.0%
1,02412832x500.183,837.710.183,092.134,264.798,174.94100.0%
1,02412864x589.484,526.440.155,681.119,270.1213,837.79100.0%
1,024128128x566.294,348.960.1511,401.8723,771.1528,135.42100.0%
1,0245121x36.0768.694.4977.5677.5614,193.67100.0%
1,0245122x71.43136.312.23108.72122.4314,332.78100.0%
1,0245124x141.36269.812.04151.15170.1414,481.36100.0%
1,0245128x273.18522.411.05297.17309.5214,987.09100.0%
1,02451216x491.36942.360.68918.281,976.7216,660.79100.0%
1,02451232x858.771,684.370.381,671.242,752.3218,638.55100.0%
1,02451264x1,340.952,671.200.233,523.945,824.2223,511.34100.0%
1,024512128x1,394.092,738.820.199,225.2018,308.9436,602.92100.0%
1,024512256x1,447.982,828.350.2127,970.0755,301.0776,774.29100.0%
1,024512512x1,479.002,885.520.2173,771.97128,909.33154,267.5398.0%
1,0245121024x1,360.922,652.060.24104,007.19167,739.89189,066.8454.7%
1,0241,0241x35.0547.205.73576.81576.8120,628.30100.0%
1,0241,0242x61.4078.523.31894.401,137.4024,612.83100.0%
1,0241,0244x129.65175.482.64535.83666.0722,023.58100.0%
1,0241,0248x239.10292.281.581,149.881,794.9326,478.78100.0%
1,0241,02416x418.19534.861.041,427.551,896.1227,499.57100.0%
1,0241,02432x691.90930.670.573,036.734,736.6632,075.70100.0%
1,0241,02464x1,073.411,460.810.365,502.1211,249.7539,425.72100.0%
1,0241,024128x1,453.992,022.180.2511,137.8715,826.2347,250.81100.0%
1,0241,024256x1,498.342,015.480.2540,107.4971,511.92104,656.24100.0%
1,0241,024512x1,678.602,270.320.2581,819.81148,541.30176,276.2185.4%
1,0242,0481x35.1847.455.70570.28570.2820,521.73100.0%
1,0242,0482x70.6693.034.07362.42599.1220,963.09100.0%
1,0242,0484x116.25145.892.84658.17674.5126,296.66100.0%
1,0242,0488x251.72317.821.451,336.011,807.9424,551.63100.0%
1,0242,04816x391.48530.040.981,881.402,886.1427,113.53100.0%
1,0242,04832x693.83921.830.573,882.845,375.7631,257.37100.0%
1,0242,04864x1,146.961,624.340.336,078.4110,060.3836,151.84100.0%
1,0242,048128x1,632.072,262.050.2410,878.6416,824.3748,484.32100.0%
1,0242,048256x1,737.802,357.260.2434,348.3764,737.5096,253.44100.0%
1,0242,048512x1,653.772,251.900.2577,212.19142,048.42170,402.5685.0%
2,0481281x27.71426.070.941,104.681,104.684,618.85100.0%
2,0481282x54.84839.550.641,135.441,136.494,665.76100.0%
2,0481284x107.631,647.890.491,223.011,226.384,754.89100.0%
2,0481288x174.802,671.900.291,517.122,312.335,852.28100.0%
2,04812816x249.943,817.550.193,124.754,485.838,189.24100.0%
2,04812832x321.634,913.390.156,026.638,912.4012,720.37100.0%
2,04812864x417.346,375.820.118,240.5014,816.4919,550.82100.0%
2,048128128x373.195,701.660.1117,948.0632,870.6642,116.00100.0%
2,0485121x33.59129.102.831,103.331,103.3315,243.11100.0%
2,0485122x66.78255.591.991,131.791,133.5515,331.53100.0%
2,0485124x119.39510.321.44394.121,015.8015,203.81100.0%
2,0485128x198.25861.470.721,731.523,252.8518,165.59100.0%
2,04851216x402.541,706.450.451,949.143,437.9018,321.25100.0%
2,04851232x676.372,786.550.264,141.376,843.4722,430.62100.0%
2,04851264x978.613,965.120.187,536.6814,206.8331,503.81100.0%
2,048512128x920.083,646.060.1718,491.7630,298.5254,407.15100.0%
2,0481,0241x35.86119.423.0768.8568.8516,451.02100.0%
2,0481,0242x62.18163.762.761,131.511,132.7923,685.61100.0%
2,0481,0244x115.96338.431.70660.171,199.3723,043.00100.0%
2,0481,0248x183.53631.610.981,614.732,261.1223,792.17100.0%
2,0481,02416x297.241,002.600.642,912.875,491.4228,691.18100.0%
2,0481,02432x637.861,849.660.393,659.867,894.3832,697.61100.0%
2,0481,02464x978.322,757.480.257,656.8412,651.8143,091.86100.0%
2,0481,024128x1,312.353,519.590.1918,688.8631,101.5467,450.60100.0%
2,0481,024256x1,227.863,322.580.2053,746.81102,600.82140,346.67100.0%
2,0482,0481x34.1889.813.781,105.931,105.9321,884.74100.0%
2,0482,0482x66.47191.252.49622.051,090.9920,366.07100.0%
2,0482,0484x111.80348.761.601,185.591,209.5222,096.65100.0%
2,0482,0488x182.42583.531.161,673.043,371.1825,338.17100.0%
2,0482,04816x308.761,138.150.643,587.715,945.3727,132.91100.0%
2,0482,04832x494.811,489.380.446,166.6811,049.6440,497.53100.0%
2,0482,04864x953.952,666.700.258,208.4614,642.5044,327.75100.0%
2,0482,048128x1,159.223,029.880.2018,416.9135,383.2068,343.02100.0%
2,0482,048256x1,177.673,164.960.2055,429.36103,769.73141,404.09100.0%
2,0482,048512x1,237.553,338.040.2173,487.70141,675.30174,445.8061.9%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H20
GPU Count8
GPU Memory (Total)760 GB
GPU Driver570.195.03
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)500 W
CPU ModelIntel(R) Xeon(R) Platinum 8469C
RAM1,007 GB

Software Configuration

Inference FrameworkvLLM
Framework Versionv0.9.0
OSUbuntu
OS Version24.04.3 LTS (Noble Numbat)
Kernel Version6.8.0-79-generic
Python Version3.12.3

Model Configuration

Providerqwen
Model Nameqwen2.5-vl-72b-instruct
QuantizationBF16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length32768
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization90.00
Temperature0.70
Top-P1.00
Top-K-1