NVIDIA H20 (8x) - deepseek-v3.1

October 26, 2025 at 03:43 PM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
865.08
Peak generation speed
Best Input TPS
4,142.63
Peak prefill speed
Best Energy Efficiency
0.16 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
80.10 ms
Lowest latency
Best E2E (P95)
3,336.74 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
12851264x865.08221.970.731,707.122,089.7436,704.62100.0%
1281281x38.3636.565.16103.10103.103,336.74100.0%
1281282x69.7268.083.30203.32263.713,668.95100.0%
1281284x123.11122.141.88373.78408.484,156.31100.0%
1281288x193.43193.991.38557.53578.235,284.28100.0%
12812816x310.30312.580.93884.20961.516,588.80100.0%
12812832x490.77491.490.661,422.501,554.968,330.51100.0%
12812864x748.35746.700.442,519.372,727.0510,886.10100.0%
128128128x757.90759.340.468,833.4213,366.2321,502.16100.0%
128128256x639.24641.530.5927,902.3842,534.0350,864.66100.0%
1285121x39.049.309.2598.4498.4413,112.79100.0%
1285122x73.7318.005.16195.83196.2113,886.64100.0%
1285124x132.4332.853.15279.95313.8715,460.48100.0%
1285128x187.0352.312.45457.59476.8019,626.07100.0%
12851216x333.1186.201.56654.23668.9223,917.66100.0%
12851232x536.39139.521.111,171.431,235.6229,385.83100.0%
128512128x854.99218.770.7521,614.5240,022.5174,798.57100.0%
1281,0241x39.294.6810.7492.6092.6026,063.99100.0%
1281,0242x74.339.075.89152.42181.7727,551.31100.0%
1281,0244x133.5716.573.58288.32325.7930,662.17100.0%
1281,0248x187.5226.442.83362.54383.0938,834.66100.0%
1281,02416x326.9543.631.86560.61574.2147,274.32100.0%
1281,02432x522.2970.691.291,012.221,150.9158,002.39100.0%
1281,02464x671.7889.011.051,814.922,191.7989,738.41100.0%
1281,024128x688.3290.921.0240,564.1097,826.75172,631.43100.0%
1281,024256x642.1183.151.1054,685.02116,474.05194,610.4051.6%
1282,0481x39.223.1710.9899.0399.0338,477.66100.0%
1282,0482x72.975.316.26208.09208.1646,993.37100.0%
1282,0484x110.258.624.39358.36395.4656,777.86100.0%
1282,0488x155.3814.553.39426.62447.6566,612.60100.0%
1282,04816x280.4524.012.20593.00607.5485,406.67100.0%
1282,04832x442.4938.861.551,025.121,117.65103,426.89100.0%
1282,04864x556.7747.861.281,808.222,123.37165,245.32100.0%
1282,048128x577.0048.131.2734,440.63123,787.37224,696.5068.0%
5121281x37.57145.582.37157.68157.683,406.39100.0%
5121282x70.08272.381.43192.60247.503,650.22100.0%
5121284x125.55485.040.81296.43335.644,074.41100.0%
5121288x196.81759.180.60476.49507.535,196.96100.0%
51212816x316.641,226.960.41859.80887.646,453.74100.0%
51212832x485.241,892.230.301,465.441,655.348,419.86100.0%
51212864x728.792,851.650.212,153.402,890.1811,148.88100.0%
512128128x645.862,517.640.2410,353.1916,820.5425,211.26100.0%
5125121x37.6837.835.9894.7094.7013,108.58100.0%
5125122x74.2872.173.31140.29140.6113,784.09100.0%
5125124x131.20126.712.07359.87397.3015,606.95100.0%
5125128x198.42200.551.51406.55450.6319,692.28100.0%
51251216x300.37329.431.07716.55843.5724,082.11100.0%
51251232x508.37533.620.721,259.431,364.5129,896.68100.0%
51251264x617.95636.790.612,110.962,991.5248,435.49100.0%
512512128x603.66617.370.6526,760.7479,798.00102,714.80100.0%
5121,0241x38.0218.928.06155.60155.6026,219.42100.0%
5121,0242x74.1636.034.48210.19210.5127,613.42100.0%
5121,0244x131.8964.052.76323.09390.8830,879.23100.0%
5121,0248x191.04101.362.09485.11535.0838,964.41100.0%
5121,02416x282.71169.251.50751.49779.6946,870.14100.0%
5121,02432x494.14273.311.001,417.831,611.2458,377.01100.0%
5121,02464x550.52289.660.942,197.913,111.86108,686.59100.0%
5121,024128x547.77289.300.9436,800.31112,689.36170,974.2980.5%
5122,0481x38.1912.549.15157.02157.0239,539.06100.0%
5122,0482x73.6422.135.14196.93256.2744,931.91100.0%
5122,0484x111.5438.903.54354.39388.6050,031.75100.0%
5122,0488x163.6262.022.61522.24552.7362,369.16100.0%
5122,04816x254.78108.681.82785.87812.3672,971.12100.0%
5122,04832x418.75156.111.291,504.551,582.6699,496.65100.0%
5122,04864x484.25171.691.182,337.643,150.51184,611.50100.0%
5122,048128x473.59167.671.2021,877.01123,073.62220,751.9059.4%
1,0241281x37.71287.271.37174.15174.153,393.29100.0%
1,0241282x70.08534.900.81237.30262.353,650.64100.0%
1,0241284x122.90938.310.47377.72417.644,164.36100.0%
1,0241288x191.831,467.400.36592.12671.755,332.45100.0%
1,02412816x302.512,320.680.251,062.891,152.886,757.52100.0%
1,02412832x444.243,409.310.191,740.762,285.029,207.50100.0%
1,02412864x432.643,339.210.205,215.8712,227.7518,764.46100.0%
1,0245121x38.9774.224.2280.1080.1013,135.56100.0%
1,0245122x74.15141.502.25148.23173.4113,807.05100.0%
1,0245124x132.04252.011.39283.70323.1615,506.74100.0%
1,0245128x207.71397.211.00529.64570.4419,711.75100.0%
1,02451216x335.52643.470.691,071.201,119.8524,400.92100.0%
1,02451232x458.75900.180.541,890.472,436.5932,410.39100.0%
1,02451264x485.96970.350.5413,898.1336,515.6164,759.75100.0%
1,024512128x469.92939.880.5748,038.71107,708.22133,765.44100.0%
1,0241,0241x39.0137.146.05172.21172.2126,248.09100.0%
1,0241,0242x74.0870.783.42225.51250.7627,603.81100.0%
1,0241,0244x132.89126.822.10400.78456.9730,819.08100.0%
1,0241,0248x209.48200.371.50579.75629.0439,080.69100.0%
1,0241,02416x334.66326.131.041,067.011,114.5448,159.73100.0%
1,0241,02432x380.19379.530.951,844.692,181.3780,417.99100.0%
1,0241,02464x406.69405.600.9327,282.72102,955.61151,265.35100.0%
1,0241,024128x393.28412.790.9433,335.4699,498.96158,695.9353.9%
1,0242,0481x39.0628.326.8281.1481.1434,405.04100.0%
1,0242,0482x67.6445.234.28217.56217.8842,773.88100.0%
1,0242,0484x115.9264.622.99357.55398.6659,904.06100.0%
1,0242,0488x182.59108.732.04471.12503.5670,235.92100.0%
1,0242,04816x283.72178.551.451,001.521,045.1182,937.01100.0%
1,0242,04832x357.74240.371.221,959.252,670.35124,812.05100.0%
1,0242,04864x362.88241.561.2227,154.91107,059.53208,543.2785.9%
2,0481281x36.72564.540.78252.47252.473,485.69100.0%
2,0481282x68.361,046.460.41247.50345.513,741.78100.0%
2,0481284x118.001,806.640.27546.00586.634,337.06100.0%
2,0481288x173.462,747.890.21933.501,005.685,690.26100.0%
2,04812816x271.224,142.630.161,312.761,851.667,538.37100.0%
2,04812832x247.883,822.120.184,564.7710,770.8216,353.72100.0%
2,0485121x38.43147.712.42251.93251.9313,321.86100.0%
2,0485122x73.41280.931.37168.49195.0313,947.12100.0%
2,0485124x103.82495.950.90523.67586.0615,804.41100.0%
2,0485128x150.07780.100.66931.541,021.6420,057.95100.0%
2,04851216x256.251,265.520.451,201.511,445.4924,707.71100.0%
2,04851232x267.131,203.380.4910,219.9131,497.8451,962.69100.0%
2,04851264x311.421,321.520.4731,952.5163,661.8890,935.40100.0%
2,048512128x294.561,196.670.5051,733.0597,406.74125,500.3064.1%
2,0481,0241x38.6974.364.08251.45251.4526,465.01100.0%
2,0481,0242x73.73141.082.28166.47197.0727,775.65100.0%
2,0481,0244x101.37254.191.53481.09579.6230,833.41100.0%
2,0481,0248x142.86416.181.09642.94698.3637,603.00100.0%
2,0481,02416x216.06663.910.781,486.081,840.7747,103.98100.0%
2,0481,02432x263.11645.040.7710,771.0756,850.4296,953.15100.0%
2,0481,02464x302.53689.140.7556,547.45116,431.60163,896.8192.2%
2,0481,024128x282.72563.810.8558,456.68116,387.69174,626.8243.8%
2,0482,0481x38.7152.395.07252.76252.7637,537.55100.0%
2,0482,0482x67.4297.542.95307.78308.1539,790.83100.0%
2,0482,0484x89.18184.791.90528.28591.7341,958.12100.0%
2,0482,0488x123.36280.891.43885.97976.0754,011.83100.0%
2,0482,04816x190.55438.351.011,346.981,775.6666,612.97100.0%
2,0482,04832x249.52401.771.0624,318.7782,058.57151,159.22100.0%
2,0482,04864x276.66406.981.0533,184.3292,523.95174,079.5357.8%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H20
GPU Count8
GPU Memory (Total)760 GB
GPU Driver570.195.03
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)500 W
CPU ModelIntel(R) Xeon(R) Platinum 8469C
RAM1,007 GB

Software Configuration

Inference FrameworkvLLM
Framework Version0.10.2
OSUbuntu
OS Version24.04.3 LTS (Noble Numbat)
Kernel Version6.8.0-79-generic
Python Version3.12.3

Model Configuration

Providerdeepseek-ai
Model Namedeepseek-v3.1
QuantizationFP8

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length40960
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization90.00
Temperature0.70
Top-P1.00
Top-K-1