NVIDIA H20 (8x) - llama-3.3-70b-instruct (High Throughput)

October 25, 2025 at 04:41 AM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
5,091.04
Peak generation speed
Best Input TPS
7,327.23
Peak prefill speed
Best Energy Efficiency
0.10 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
39.57 ms
Lowest latency
Best E2E (P95)
3,596.85 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
128512512x5,091.041,352.770.143,242.224,908.4148,317.33100.0%
1281281x31.9630.466.29655.38655.384,005.14100.0%
1281282x63.4361.944.33680.25681.164,035.16100.0%
1281284x141.51140.412.10148.59220.113,615.85100.0%
1281288x255.74256.491.66206.02515.263,879.05100.0%
12812816x487.04490.610.90401.70639.134,199.69100.0%
12812832x950.48958.190.47492.39729.454,268.38100.0%
12812864x1,609.751,602.280.28721.071,286.975,053.52100.0%
128128128x2,640.102,648.180.161,285.491,981.236,113.96100.0%
128128256x3,337.713,369.640.122,363.063,352.899,646.29100.0%
128128512x3,626.313,660.220.114,344.186,514.3017,760.21100.0%
1281281024x3,673.463,690.120.106,814.1213,008.4235,082.17100.0%
1285121x37.889.0310.0978.8278.8213,517.33100.0%
1285122x75.6818.487.0239.4039.5713,529.08100.0%
1285124x150.1437.243.5656.7969.7613,636.70100.0%
1285128x287.5374.472.9067.1797.2713,787.70100.0%
12851216x576.38147.401.4675.7593.6013,989.55100.0%
12851232x1,071.49278.480.79138.01207.7914,712.26100.0%
12851264x2,059.97540.290.40224.41363.0015,061.78100.0%
128512128x3,623.37937.230.24401.93778.2317,379.96100.0%
128512256x4,449.861,181.120.17886.951,744.8227,674.45100.0%
1285121024x4,641.721,223.690.136,873.2212,796.13104,149.47100.0%
1281,0241x37.487.6510.4484.7984.7915,930.78100.0%
1281,0242x74.7613.497.5266.6091.6018,486.58100.0%
1281,0244x115.8520.904.51129.21169.8623,386.28100.0%
1281,0248x225.4642.273.10146.10198.6023,627.26100.0%
1281,02416x487.7686.311.79265.34383.8423,342.79100.0%
1281,02432x824.56144.591.03369.20502.2826,817.27100.0%
1281,02464x1,540.91265.090.57746.74959.6430,616.22100.0%
1281,024128x2,662.90452.460.331,363.422,140.5134,556.07100.0%
1281,024256x4,123.53716.810.212,006.783,442.2343,859.01100.0%
1281,024512x4,619.05805.080.174,224.956,922.7780,164.53100.0%
1281,0241024x4,314.14749.140.167,637.0812,483.24164,435.11100.0%
1282,0481x37.545.6610.9286.8286.8221,526.74100.0%
1282,0482x70.0713.777.6392.0394.9818,000.53100.0%
1282,0484x110.2518.665.5475.16120.4126,080.57100.0%
1282,0488x250.6045.142.74211.54311.1622,134.71100.0%
1282,04816x409.7269.991.93244.51331.0025,948.19100.0%
1282,04832x804.80138.271.10313.15527.5628,036.41100.0%
1282,04864x1,491.48258.280.60575.26885.6329,023.51100.0%
1282,048128x1,994.51332.840.391,096.861,606.4534,067.74100.0%
1282,048256x2,809.56474.120.282,365.604,004.0250,959.87100.0%
1282,048512x4,009.61690.870.194,123.505,967.7178,749.04100.0%
1282,0481024x4,152.95711.410.177,573.7612,603.72163,617.76100.0%
5121281x35.35136.982.53293.97293.973,620.82100.0%
5121282x70.91275.621.69156.03260.193,596.85100.0%
5121284x129.23499.240.93428.80569.783,960.32100.0%
5121288x229.24934.470.77489.09818.184,224.46100.0%
51212816x418.851,661.990.41745.811,322.634,767.35100.0%
51212832x661.472,617.840.241,428.742,389.856,077.71100.0%
51212864x893.543,476.220.182,831.544,808.559,119.60100.0%
512128128x1,364.325,351.040.124,901.637,420.4211,821.29100.0%
512128256x1,564.326,099.170.107,259.0212,814.7420,693.58100.0%
512128512x1,572.516,156.150.1013,884.6628,137.1840,946.44100.0%
5121281024x1,440.665,636.920.1123,410.3649,656.2786,114.71100.0%
5125121x37.1335.976.54286.51286.5113,788.74100.0%
5125122x64.3072.544.45162.51274.1613,518.11100.0%
5125124x139.59138.402.38373.23576.4914,290.04100.0%
5125128x258.12277.661.89306.07576.7714,222.31100.0%
51251216x494.52514.560.98935.361,535.1415,413.83100.0%
51251232x878.36920.150.551,538.042,527.9617,328.80100.0%
51251264x1,498.891,571.580.312,930.664,496.8320,228.50100.0%
512512128x2,388.122,489.770.194,415.397,516.4525,485.54100.0%
512512256x2,890.762,968.520.157,428.5413,301.3042,722.05100.0%
512512512x3,180.803,253.670.1313,171.7325,322.5777,867.71100.0%
5125121024x2,506.452,553.110.1628,165.0454,070.63192,947.53100.0%
5121,0241x36.9244.125.82285.92285.9211,215.67100.0%
5121,0242x58.1346.725.59287.50290.1020,800.53100.0%
5121,0244x123.7390.753.67357.04525.8921,144.34100.0%
5121,0248x228.07160.762.55329.49548.2323,916.08100.0%
5121,02416x377.62316.411.391,279.982,075.0424,375.76100.0%
5121,02432x689.84512.470.821,643.322,578.9329,582.76100.0%
5121,02464x1,345.07930.950.452,743.343,605.3132,939.21100.0%
5121,024128x2,275.691,540.900.264,442.157,089.4238,586.73100.0%
5121,024256x3,086.752,099.510.187,451.4613,244.2558,879.22100.0%
5121,024512x3,335.072,278.740.1512,490.0525,172.83108,502.25100.0%
5121,0241024x2,879.881,958.770.1826,836.4853,000.30228,172.6391.8%
5122,0481x36.9637.886.28289.44289.4413,067.99100.0%
5122,0482x62.3852.074.08436.15559.3218,810.11100.0%
5122,0484x125.8190.332.91503.21834.0421,515.70100.0%
5122,0488x225.08173.822.46649.581,073.4522,360.50100.0%
5122,04816x408.73323.591.34768.431,095.9122,676.33100.0%
5122,04832x628.02464.540.851,450.532,081.9728,378.63100.0%
5122,04864x990.05655.070.572,565.144,140.6937,018.87100.0%
5122,048128x1,857.681,245.470.324,980.358,220.9842,526.96100.0%
5122,048256x2,898.771,945.300.207,682.3513,720.5657,380.99100.0%
5122,048512x3,061.742,049.440.1713,702.8428,078.69107,297.82100.0%
5122,0481024x2,815.441,906.340.1828,047.9653,739.10230,007.9993.0%
1,0241281x32.74249.361.60542.71542.713,910.35100.0%
1,0241282x65.76501.931.02282.15498.143,866.52100.0%
1,0241284x115.21879.610.73670.54989.134,437.61100.0%
1,0241288x185.841,421.600.461,238.742,063.195,505.41100.0%
1,02412816x268.272,058.030.312,170.814,094.487,628.32100.0%
1,02412832x461.893,543.870.193,049.034,791.318,838.84100.0%
1,02412864x710.055,475.630.134,477.937,548.6011,447.39100.0%
1,024128128x800.336,223.370.117,766.3714,180.4220,107.50100.0%
1,024128256x893.086,881.620.1014,007.6527,174.1636,345.72100.0%
1,024128512x824.766,349.840.1023,632.8850,470.3275,547.97100.0%
1,0245121x37.6871.764.2940.5640.5613,587.23100.0%
1,0245122x72.69138.703.14536.68546.1514,084.38100.0%
1,0245124x139.12265.542.04555.231,004.1914,714.71100.0%
1,0245128x278.15531.921.27499.02904.8414,712.07100.0%
1,02451216x480.74925.160.571,636.232,566.0916,976.51100.0%
1,02451232x789.791,561.750.393,099.844,733.4420,089.62100.0%
1,02451264x1,305.772,616.200.244,515.938,012.4424,005.21100.0%
1,024512128x1,761.773,487.550.168,108.6517,396.2835,975.69100.0%
1,024512256x2,328.964,699.470.1313,282.4826,046.7653,327.14100.0%
1,024512512x1,788.443,561.240.1524,492.2750,562.93130,532.06100.0%
1,0241,0241x36.8547.525.66548.44548.4420,491.23100.0%
1,0241,0242x66.5090.234.05534.08540.3721,421.09100.0%
1,0241,0244x117.17139.472.95677.131,002.8527,010.82100.0%
1,0241,0248x222.93273.921.82626.631,083.0527,602.96100.0%
1,0241,02416x408.04516.451.071,689.282,599.3728,819.99100.0%
1,0241,02432x698.25951.730.583,160.995,081.8830,780.86100.0%
1,0241,02464x1,040.641,476.130.364,773.319,940.7136,402.73100.0%
1,0241,024128x1,763.662,460.290.228,016.8715,749.0947,380.27100.0%
1,0241,024256x2,389.053,311.010.1613,359.5127,079.0972,277.92100.0%
1,0241,024512x1,948.452,672.760.1727,056.6754,715.33172,337.4299.4%
1,0242,0481x37.5744.785.7843.8743.8721,745.95100.0%
1,0242,0482x69.2780.423.08809.771,039.2124,182.59100.0%
1,0242,0484x120.39129.333.16553.251,006.8629,565.27100.0%
1,0242,0488x192.97245.861.701,134.501,584.6529,708.61100.0%
1,0242,04816x427.80553.180.981,082.862,093.3427,684.94100.0%
1,0242,04832x735.47926.290.593,239.534,758.6831,547.43100.0%
1,0242,04864x881.921,193.990.404,786.889,441.7336,386.14100.0%
1,0242,048128x1,638.602,236.080.247,405.0413,569.2249,507.26100.0%
1,0242,048256x1,988.192,684.380.2013,300.8425,924.8176,151.15100.0%
1,0242,048512x1,965.592,647.540.1827,077.1753,839.95172,093.52100.0%
2,0481281x28.93444.850.881,053.801,053.804,423.98100.0%
2,0481282x57.67882.860.63556.451,020.324,387.65100.0%
2,0481284x93.161,426.310.471,324.951,959.175,490.84100.0%
2,0481288x157.762,411.340.311,562.583,060.506,487.84100.0%
2,04812816x208.723,188.040.203,023.975,523.859,786.82100.0%
2,04812832x288.334,425.180.155,191.059,548.2314,093.78100.0%
2,04812864x413.506,317.890.117,354.5314,837.7419,723.12100.0%
2,048128128x475.697,327.230.1014,010.5126,377.0634,039.10100.0%
2,048128256x413.166,335.130.1025,180.9152,822.0975,279.78100.0%
2,0485121x35.02134.622.721,050.571,050.5714,618.82100.0%
2,0485122x70.05268.081.93551.171,010.7014,567.20100.0%
2,0485124x120.67461.881.072,084.893,114.6416,970.04100.0%
2,0485128x228.01955.500.821,581.732,111.1016,379.23100.0%
2,04851216x412.411,725.370.441,816.634,038.0618,117.24100.0%
2,04851232x618.932,573.600.284,109.459,073.3124,289.08100.0%
2,04851264x857.893,465.540.197,506.4117,590.3736,059.16100.0%
2,048512128x1,180.814,718.030.1414,188.7729,733.3452,919.76100.0%
2,048512256x1,038.054,136.620.1425,549.2256,287.16106,358.84100.0%
2,0481,0241x37.67110.153.2143.0643.0617,840.48100.0%
2,0481,0242x70.29250.212.031,062.091,065.2915,633.99100.0%
2,0481,0244x110.97367.801.59301.69919.1920,611.84100.0%
2,0481,0248x202.23634.840.892,061.013,093.2623,909.93100.0%
2,0481,02416x340.141,041.690.692,549.584,041.8428,692.34100.0%
2,0481,02432x601.091,830.500.404,820.859,498.6832,703.51100.0%
2,0481,02464x842.612,441.160.278,585.6717,927.1046,991.12100.0%
2,0481,024128x1,395.013,878.520.1711,421.6926,542.3661,362.97100.0%
2,0481,024256x1,567.254,345.310.1525,528.2651,688.54111,218.90100.0%
2,0481,024512x1,387.573,845.880.1739,537.89113,625.21146,779.6759.4%
2,0482,0481x35.8386.103.851,052.661,052.6622,829.81100.0%
2,0482,0482x63.74177.401.951,581.052,045.1221,946.25100.0%
2,0482,0484x95.43283.901.831,067.781,942.1625,996.98100.0%
2,0482,0488x211.02637.500.951,947.803,074.5124,456.17100.0%
2,0482,04816x183.63512.691.003,007.915,062.2037,167.61100.0%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H20
GPU Count8
GPU Memory (Total)760 GB
GPU Driver570.195.03
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)500 W
CPU ModelIntel(R) Xeon(R) Platinum 8469C
RAM1,007 GB

Software Configuration

Inference FrameworkvLLM
Framework Version0.9.0
OSUbuntu
OS Version24.04.3 LTS (Noble Numbat)
Kernel Version6.8.0-79-generic
Python Version3.12.3

Model Configuration

Providermeta-llama
Model Namellama-3.3-70b-instruct
QuantizationBF16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length8192
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization0.95
Temperature0.70
Top-P1.00
Top-K-1