NVIDIA H100 80GB HBM3 (8x) - llama-3.3-70b-instruct

November 12, 2025 at 10:51 PM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
9,219.60
Peak generation speed
Best Input TPS
16,108.82
Peak prefill speed
Best Energy Efficiency
0.06 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
55.04 ms
Lowest latency
Best E2E (P95)
3,854.59 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
128128512x9,219.609,292.790.07877.511,495.926,630.67100.0%
1281281x27.1425.877.34616.82616.824,716.43100.0%
1281282x53.3852.133.73662.87683.504,777.17100.0%
1281284x104.83104.013.04359.70676.354,830.10100.0%
1281288x219.08219.731.54207.89457.974,484.58100.0%
12812816x503.57507.250.90133.26168.884,057.14100.0%
12812832x967.17975.970.46227.44277.784,183.64100.0%
12812864x1,814.701,816.040.26305.07464.364,448.13100.0%
128128128x3,370.363,394.610.16389.14573.844,721.16100.0%
128128256x5,760.755,787.550.10548.14985.325,462.03100.0%
1281281024x8,299.698,339.440.081,541.562,938.6313,544.64100.0%
1285121x33.678.0210.8956.8656.8615,204.75100.0%
1285122x66.4616.235.4179.8094.6315,403.82100.0%
1285124x131.8632.714.8968.1195.0415,507.73100.0%
1285128x259.2865.143.0388.18118.3515,761.19100.0%
12851216x521.50131.431.52123.20153.8615,679.69100.0%
12851232x958.12254.510.84191.22221.2316,103.55100.0%
12851264x1,897.37494.900.44282.86382.6916,432.84100.0%
128512128x3,604.06950.330.25385.16562.0017,129.41100.0%
128512256x6,557.521,733.120.15525.88858.7518,707.25100.0%
128512512x7,117.511,891.830.14911.751,594.8532,498.21100.0%
1285121024x8,033.102,120.050.141,673.592,731.4758,919.98100.0%
1281,0241x33.655.4411.9257.0957.0922,409.64100.0%
1281,0242x60.1211.898.2854.1556.3220,771.70100.0%
1281,0244x119.0120.514.5078.2596.1124,723.66100.0%
1281,0248x234.0045.393.3195.63136.0122,462.01100.0%
1281,02416x398.1469.632.07125.71160.5927,358.96100.0%
1281,02432x766.56128.951.11195.23263.7731,386.62100.0%
1281,02464x1,484.12251.630.61269.80366.7031,761.61100.0%
1281,024128x2,839.98477.590.33401.68605.5132,988.90100.0%
1281,024256x5,168.25900.260.20550.47965.9235,482.04100.0%
1281,024512x5,873.541,023.720.18912.371,582.8156,209.75100.0%
1281,0241024x6,142.031,071.420.181,819.772,946.21105,200.93100.0%
1282,0481x33.015.7611.7157.2857.2821,146.92100.0%
1282,0482x57.0911.898.4253.0955.0420,688.16100.0%
1282,0484x109.2718.625.8771.5993.8026,876.94100.0%
1282,0488x217.0839.783.4884.92114.3425,684.39100.0%
1282,04816x428.5278.501.98130.40189.8526,141.75100.0%
1282,04832x649.58115.241.20219.03280.7827,926.28100.0%
1282,04864x1,184.07205.710.69270.08395.3631,065.79100.0%
1282,048128x2,144.93360.720.40388.44607.6534,699.60100.0%
1282,048256x4,209.53716.700.23598.541,082.8438,548.04100.0%
1282,048512x5,061.13876.030.19924.691,575.2159,001.50100.0%
1282,0481024x5,740.01985.740.181,852.733,259.69116,175.17100.0%
5121281x33.21128.702.6087.5487.543,854.59100.0%
5121282x66.20257.311.7687.3087.893,865.73100.0%
5121284x125.40484.450.94132.98175.504,082.32100.0%
5121288x246.57951.120.61157.68239.364,146.05100.0%
51212816x468.921,847.730.39258.22370.184,282.66100.0%
51212832x893.353,481.130.22450.13639.184,575.47100.0%
51212864x1,486.635,834.860.15691.301,249.835,406.51100.0%
512128128x2,699.8710,619.950.081,139.221,671.735,913.08100.0%
512128256x4,124.7616,108.820.061,806.472,999.877,767.85100.0%
512128512x3,695.0814,438.360.075,543.1412,386.1717,147.01100.0%
5125121x33.2532.216.9887.8087.8015,398.42100.0%
5125122x64.8465.104.7785.8786.2215,234.47100.0%
5125124x131.78127.283.1297.97148.3115,533.45100.0%
5125128x228.85249.722.06146.13227.0815,815.47100.0%
51251216x436.50485.711.08327.03523.5216,329.11100.0%
51251232x920.37952.040.55394.88620.1116,751.81100.0%
51251264x1,830.201,829.400.30706.26978.0017,388.41100.0%
512512128x3,154.723,227.990.171,138.332,014.9619,639.35100.0%
512512256x3,547.973,642.470.162,121.713,369.5232,171.43100.0%
512512512x3,957.524,036.880.1511,265.8027,968.2156,547.27100.0%
5125121024x4,368.094,466.750.1436,409.0475,454.5397,371.69100.0%
5121,0241x33.5036.366.4784.9684.9613,613.44100.0%
5121,0242x61.4948.875.5784.3587.4620,159.65100.0%
5121,0244x115.4985.144.5785.8791.6922,986.75100.0%
5121,0248x180.16141.762.57174.81289.4926,393.68100.0%
5121,02416x378.93292.841.37232.93321.1126,221.72100.0%
5121,02432x677.80502.650.82491.17706.7729,774.20100.0%
5121,02464x1,362.22939.590.46676.97942.1932,942.25100.0%
5121,024128x2,621.901,796.420.251,036.121,726.8735,094.53100.0%
5121,024256x3,258.462,202.290.212,146.773,166.8248,113.29100.0%
5121,024512x3,681.222,515.750.1917,543.1461,200.3290,110.80100.0%
5121,0241024x4,090.532,831.210.1850,666.06113,233.62143,406.2789.9%
5122,0481x33.4128.237.3488.0188.0117,540.96100.0%
5122,0482x49.8336.356.3368.4982.9426,635.25100.0%
5122,0484x110.5787.073.13141.73159.3622,301.59100.0%
5122,0488x217.61139.292.67169.66294.2428,110.73100.0%
5122,04816x394.90329.541.38229.27369.0623,801.13100.0%
5122,04832x558.72417.360.87502.08863.7029,760.30100.0%
5122,04864x1,187.43806.570.50590.49821.8734,909.41100.0%
5122,048128x1,466.45982.910.351,066.641,625.7137,676.56100.0%
5122,048256x2,680.311,807.810.232,268.833,533.1850,096.30100.0%
5122,048512x3,662.022,443.880.2018,751.3064,216.1490,261.76100.0%
5122,0481024x3,849.732,597.820.1949,273.89112,644.43142,744.4085.9%
1,0241281x31.91243.081.53149.10149.104,011.11100.0%
1,0241282x61.76471.410.79215.30272.364,142.45100.0%
1,0241284x123.14940.120.69135.44262.274,150.01100.0%
1,0241288x240.091,836.580.43247.13422.804,259.39100.0%
1,02412816x432.623,318.760.25435.71756.944,730.09100.0%
1,02412832x782.986,025.120.15794.821,155.475,204.63100.0%
1,02412864x1,401.7810,763.860.091,320.811,695.905,826.27100.0%
1,024128128x2,032.3415,712.290.071,985.683,402.587,942.36100.0%
1,024128256x1,930.6014,850.810.075,387.3912,384.1116,714.51100.0%
1,0245121x33.4463.694.72142.18142.1815,308.66100.0%
1,0245122x64.99124.022.37214.86269.8415,754.21100.0%
1,0245124x128.91246.052.17156.18260.0415,880.51100.0%
1,0245128x251.01480.021.12262.74446.4416,312.54100.0%
1,02451216x488.09940.330.71488.38762.5116,698.61100.0%
1,02451232x916.241,802.940.39791.991,135.0917,417.22100.0%
1,02451264x1,670.653,369.800.221,506.051,952.1018,632.41100.0%
1,024512128x2,043.734,060.930.162,151.413,706.6023,699.74100.0%
1,024512256x2,119.264,219.930.1610,483.0827,542.2146,053.44100.0%
1,024512512x2,735.235,452.980.1430,823.9571,882.5691,419.17100.0%
1,0245121024x2,778.545,558.160.1558,739.89119,341.46139,692.7385.9%
1,0241,0241x33.2940.226.19146.36146.3624,210.11100.0%
1,0241,0242x60.8872.143.39123.98179.1426,851.59100.0%
1,0241,0244x120.04145.412.39171.56264.8626,727.35100.0%
1,0241,0248x203.74245.682.05277.35402.5331,239.74100.0%
1,0241,02416x378.02492.291.09458.91659.6031,467.14100.0%
1,0241,02432x692.66940.760.58722.391,153.0629,436.83100.0%
1,0241,02464x1,266.401,810.660.331,221.161,937.8433,018.07100.0%
1,0241,024128x2,391.213,300.040.202,119.493,183.6036,360.26100.0%
1,0241,024256x2,107.712,925.520.2113,060.9638,618.6169,371.86100.0%
1,0242,0481x32.7739.016.36148.27148.2724,962.72100.0%
1,0242,0482x63.3278.764.41143.78144.5724,669.58100.0%
1,0242,0484x117.39133.513.20142.34187.1428,972.86100.0%
1,0242,0488x218.18286.741.58266.15393.4226,890.91100.0%
1,0242,04816x346.41421.531.17406.71566.1732,859.39100.0%
1,0242,04832x529.59683.950.71843.901,119.4031,280.26100.0%
1,0242,04864x991.441,339.750.401,273.871,803.5532,436.35100.0%
1,0242,048128x1,925.682,696.440.222,078.983,493.8436,707.03100.0%
1,0242,048256x1,996.042,701.900.2313,147.3138,455.1969,071.08100.0%
1,0242,048512x2,643.333,612.810.1943,171.0996,704.53124,074.76100.0%
1,0242,0481024x2,631.573,575.980.1951,157.90114,053.56139,858.2756.2%
2,0481281x31.64486.410.89268.91268.914,046.12100.0%
2,0481282x62.85962.190.57164.69263.694,058.78100.0%
2,0481284x116.181,778.760.40384.52511.444,406.34100.0%
2,0481288x208.433,185.830.23661.23993.344,910.94100.0%
2,04812816x417.536,377.370.14682.57986.434,897.04100.0%
2,04812832x656.7310,032.390.091,495.552,171.236,228.75100.0%
2,04812864x972.3314,927.360.072,342.564,078.588,334.44100.0%
2,048128128x1,031.9715,815.010.075,221.4011,241.1415,699.58100.0%
2,048128256x966.1414,786.520.0812,942.2024,070.7929,738.16100.0%
2,0485121x32.89126.422.95255.03255.0315,567.00100.0%
2,0485122x64.20245.721.48384.77491.6415,945.90100.0%
2,0485124x120.27471.631.07462.52756.9116,613.38100.0%
2,0485128x226.75936.180.69537.15962.9416,712.60100.0%
2,04851216x423.391,822.480.45735.341,257.3517,154.96100.0%
2,04851232x795.583,348.420.251,639.072,178.5918,668.58100.0%
2,04851264x1,466.305,957.300.152,217.643,587.1520,949.30100.0%
2,048512128x1,472.145,904.890.169,143.1724,688.8042,272.93100.0%
2,048512256x1,451.845,768.090.1627,386.8067,944.5682,943.81100.0%
2,0481,0241x32.8077.504.13258.26258.2625,362.65100.0%
2,0481,0242x65.98230.042.13158.32252.0917,003.82100.0%
2,0481,0244x87.84252.042.27258.37262.2829,652.61100.0%
2,0481,0248x194.75592.361.00410.15552.3525,560.66100.0%
2,0481,02416x349.721,234.840.62964.771,462.0424,796.90100.0%
2,0481,02432x634.201,837.700.411,581.932,134.6632,918.56100.0%
2,0481,02464x910.442,663.690.272,274.113,581.5037,518.96100.0%
2,0481,024128x1,252.153,484.940.2212,027.4636,494.5762,568.39100.0%
2,0481,024256x1,467.124,175.270.2034,903.3479,090.46104,346.81100.0%
2,0481,024512x1,643.524,584.440.1952,857.39111,931.54136,031.3367.8%
2,0482,0481x32.8995.913.54259.10259.1020,489.26100.0%
2,0482,0482x64.22215.292.28158.53251.4118,123.76100.0%
2,0482,0484x65.62126.323.35259.40263.7856,044.10100.0%
2,0482,0488x174.63585.910.77894.971,258.1925,336.93100.0%
2,0482,04816x296.651,007.210.68661.131,221.0928,544.45100.0%
2,0482,04832x629.961,946.190.391,503.382,166.4131,317.08100.0%
2,0482,04864x952.252,745.080.272,225.224,245.4134,600.00100.0%
2,0482,048128x1,227.163,281.210.2411,320.3132,361.0862,137.38100.0%
2,0482,048256x1,253.613,475.100.2334,637.1578,950.27104,836.75100.0%
2,0482,048512x1,533.634,182.230.2052,995.04114,087.13139,697.8167.8%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H100 80GB HBM3
GPU Count8
GPU Memory (Total)632 GB
GPU Driver570.195.03
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)700 W
CPU ModelIntel(R) Xeon(R) Platinum 8480+
RAM1,772 GB

Software Configuration

Inference FrameworkvLLM
Framework Version0.11.0
OSUbuntu
OS Version22.04.5 LTS (Jammy Jellyfish)
Kernel Version5.15.0-88-generic
Python Version3.10.12

Model Configuration

Providermeta-llama
Model Namellama-3.3-70b-instruct
QuantizationFP16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length8192
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization0.95
Temperature0.70
Top-P1.00
Top-K-1