NVIDIA H20 (8x) - mistral-nemo-instruct-2407

October 24, 2025 at 08:17 PM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
12,605.43
Peak generation speed
Best Input TPS
24,969.58
Peak prefill speed
Best Energy Efficiency
0.02 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
44.44 ms
Lowest latency
Best E2E (P95)
1,330.68 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
1281,0241024x12,605.432,255.570.065,663.8215,323.9630,680.05100.0%
1281281x76.0572.491.97476.52476.521,682.07100.0%
1281282x145.04141.641.16515.63522.431,761.89100.0%
1281284x280.39278.200.72364.89604.451,821.30100.0%
1281288x495.88497.340.44571.20838.812,059.54100.0%
12812816x883.90890.380.30764.871,081.142,303.50100.0%
12812832x1,661.661,664.100.161,103.321,194.922,447.41100.0%
12812864x2,248.692,243.190.112,153.552,280.063,606.49100.0%
128128128x2,597.742,611.450.103,991.914,814.866,144.33100.0%
128128256x1,486.921,519.700.1318,120.7919,759.2021,359.29100.0%
1285121x101.3429.793.2044.4444.444,083.88100.0%
1285122x202.7349.501.6189.5495.615,045.34100.0%
1285124x401.1099.490.95151.20158.455,096.22100.0%
1285128x738.67195.620.66224.71248.695,241.44100.0%
12851216x1,407.45375.090.38386.96414.105,490.12100.0%
12851232x2,673.36698.210.24717.42783.695,852.77100.0%
12851264x4,260.481,127.960.141,494.651,850.387,113.07100.0%
128512128x5,519.181,437.640.104,698.285,623.9211,105.32100.0%
128512256x4,652.771,215.590.1017,453.0619,319.6926,323.45100.0%
1281,0241x101.7326.412.6544.7744.774,609.01100.0%
1281,0242x191.9447.511.6888.1288.505,219.68100.0%
1281,0244x342.8564.721.18143.24150.647,704.35100.0%
1281,0248x601.55124.530.90210.90221.898,056.73100.0%
1281,02416x1,040.35197.250.60387.06408.5410,337.69100.0%
1281,02432x2,116.10373.520.31715.05788.7810,858.11100.0%
1281,02464x3,897.05681.940.191,388.111,719.4311,747.92100.0%
1281,024128x5,779.181,045.650.123,335.114,447.8714,125.07100.0%
1281,024256x5,917.121,046.810.0916,238.7117,924.2129,967.42100.0%
1281,024512x6,480.021,163.970.0831,407.6836,744.7852,144.58100.0%
1282,0481x101.8525.412.7947.1647.164,789.39100.0%
1282,0482x173.8439.151.9083.3984.146,276.32100.0%
1282,0484x324.5357.101.08149.44158.368,580.82100.0%
1282,0488x536.44124.760.89214.03239.747,915.82100.0%
1282,04816x1,021.96199.560.56373.96408.859,226.13100.0%
1282,04832x1,582.85276.980.38728.97776.9712,185.49100.0%
1282,04864x2,849.63489.670.231,505.521,822.5714,120.19100.0%
1282,048128x4,904.05844.500.133,734.744,802.6315,885.77100.0%
1282,048256x5,621.32980.440.1114,073.7215,594.2928,150.37100.0%
1282,048512x6,575.941,126.800.0924,447.3929,030.3746,270.42100.0%
1282,0481024x7,250.911,248.400.0752,370.1360,272.1484,894.2996.3%
5121281x96.17372.650.69127.48127.481,330.68100.0%
5121282x188.24731.620.40150.54153.541,358.62100.0%
5121284x321.001,240.130.23297.94353.901,591.35100.0%
5121288x637.612,459.530.18308.83379.841,602.60100.0%
51212816x970.623,761.140.11573.72814.492,099.30100.0%
51212832x1,757.946,850.210.07868.181,020.722,313.10100.0%
51212864x2,253.888,833.150.051,609.962,107.403,564.95100.0%
512128128x2,506.459,775.180.053,538.074,757.716,342.01100.0%
512128256x1,606.546,286.500.0615,622.1617,698.4919,716.35100.0%
5125121x102.7799.561.69122.87122.874,981.21100.0%
5125122x207.50201.621.0270.8370.944,932.66100.0%
5125124x405.79391.920.61130.77138.305,042.57100.0%
5125128x752.60746.550.41262.25341.985,282.13100.0%
51251216x1,305.641,393.990.28441.38567.695,686.95100.0%
51251232x2,397.512,482.270.17826.381,127.316,414.35100.0%
51251264x4,103.624,138.420.101,499.682,003.277,661.27100.0%
512512128x5,871.795,953.050.073,518.154,686.8410,409.66100.0%
512512256x4,921.804,982.700.0616,034.6517,865.0925,209.23100.0%
5121,0241x103.7291.871.8946.4446.445,388.42100.0%
5121,0242x159.92100.961.5386.3887.059,625.56100.0%
5121,0244x300.47198.380.94146.12187.989,584.33100.0%
5121,0248x647.39484.960.58229.59290.898,099.89100.0%
5121,02416x1,022.96781.870.42403.89526.898,655.77100.0%
5121,02432x1,950.931,466.730.23798.82948.5110,699.79100.0%
5121,02464x3,487.212,440.280.131,426.061,844.1712,058.17100.0%
5121,024128x5,662.234,013.620.083,363.974,350.1714,773.99100.0%
5121,024256x6,393.944,375.900.0710,944.6114,090.3526,969.68100.0%
5121,024512x6,548.694,659.790.0629,400.2633,898.6451,069.54100.0%
5121,0241024x6,754.844,820.890.0553,641.8862,628.3789,414.9892.2%
5122,0481x103.7298.551.7448.1048.105,022.45100.0%
5122,0482x185.20172.861.1388.9589.295,683.99100.0%
5122,0484x357.13270.550.86124.84125.497,226.79100.0%
5122,0488x612.21417.370.53200.16210.199,216.68100.0%
5122,04816x1,080.78854.800.38369.46444.588,552.28100.0%
5122,04832x1,647.221,249.590.24726.76868.089,678.93100.0%
5122,04864x2,823.852,015.050.151,345.911,700.1511,684.53100.0%
5122,048128x3,507.202,386.090.113,700.984,666.8815,643.16100.0%
5122,048256x4,994.693,428.530.0814,759.5416,350.3029,649.59100.0%
5122,048512x5,942.794,066.280.0730,249.7334,296.5752,984.26100.0%
5122,0481024x6,958.754,792.160.0642,249.6256,817.7085,157.60100.0%
1,0241281x89.82684.210.42209.22209.221,423.82100.0%
1,0241282x175.581,340.190.24242.41242.631,455.86100.0%
1,0241284x340.202,597.340.15286.06288.241,501.57100.0%
1,0241288x531.954,069.090.11470.53679.341,921.25100.0%
1,02412816x974.777,477.870.07611.56845.002,089.52100.0%
1,02412832x1,379.2510,585.050.061,166.161,598.512,943.27100.0%
1,02412864x1,879.4214,447.400.041,994.642,761.604,308.12100.0%
1,024128128x2,047.8015,745.710.033,989.415,628.367,907.41100.0%
1,024128256x1,415.6010,920.280.0416,699.5619,653.7722,437.13100.0%
1,0245121x100.47191.331.19205.35205.355,094.69100.0%
1,0245122x199.14380.010.72159.25230.995,133.42100.0%
1,0245124x383.59732.160.38221.49312.525,335.89100.0%
1,0245128x762.611,458.390.31255.60385.955,363.47100.0%
1,02451216x1,345.422,612.840.19548.38835.166,002.20100.0%
1,02451232x2,244.714,432.580.111,032.981,581.087,061.29100.0%
1,02451264x3,840.887,531.610.071,570.872,161.668,306.20100.0%
1,024512128x5,005.0910,002.700.053,929.815,569.4912,287.02100.0%
1,024512256x4,531.029,051.720.0516,019.5118,396.2727,140.95100.0%
1,0241,0241x103.82145.651.4449.4149.416,683.30100.0%
1,0241,0242x167.41197.171.1268.8469.189,719.38100.0%
1,0241,0244x303.18400.510.65247.94353.439,368.98100.0%
1,0241,0248x593.00818.070.41275.34475.019,330.51100.0%
1,0241,02416x1,122.121,554.770.26375.23515.9910,074.12100.0%
1,0241,02432x1,905.422,661.050.17805.421,027.5710,402.99100.0%
1,0241,02464x3,409.724,889.550.091,621.242,297.7912,297.09100.0%
1,0241,024128x5,135.237,286.960.073,687.855,393.7916,214.06100.0%
1,0241,024256x5,308.247,240.600.0616,348.4318,508.4832,461.80100.0%
1,0241,024512x5,861.328,069.450.0529,646.6534,511.7357,006.53100.0%
1,0241,0241024x6,333.158,872.760.0547,863.9162,245.6298,674.3699.0%
1,0242,0481x101.23149.311.43206.90206.906,518.81100.0%
1,0242,0482x169.64218.081.06159.00230.228,792.41100.0%
1,0242,0484x285.95360.580.78208.65289.3310,285.19100.0%
1,0242,0488x514.97673.980.44420.41555.8610,796.08100.0%
1,0242,04816x930.331,265.490.30498.72710.2410,334.31100.0%
1,0242,04832x1,804.572,326.380.19939.911,155.8512,253.66100.0%
1,0242,04864x2,448.663,310.560.111,791.902,502.5713,745.00100.0%
1,0242,048128x4,970.196,895.600.073,475.274,583.0616,398.86100.0%
1,0242,048256x4,457.886,055.760.0616,869.1120,239.3736,117.94100.0%
2,0481281x79.901,228.460.24373.86373.861,601.19100.0%
2,0481282x156.002,388.180.14408.99413.691,639.00100.0%
2,0481284x304.044,654.990.09371.04456.861,681.74100.0%
2,0481288x480.537,344.910.06658.49870.452,126.55100.0%
2,04812816x769.0611,746.530.05943.901,330.282,655.45100.0%
2,04812832x1,015.8715,518.850.041,731.652,326.944,017.83100.0%
2,04812864x1,376.4121,030.250.032,568.853,640.525,908.57100.0%
2,048128128x1,630.9224,969.580.025,377.287,559.119,858.42100.0%
2,048128256x1,216.0018,671.670.0316,103.1520,963.9126,223.32100.0%
2,0485121x102.20392.810.6951.4951.495,008.50100.0%
2,0485122x189.84726.550.45293.43437.205,376.49100.0%
2,0485124x312.501,349.690.29386.28749.095,702.72100.0%
2,0485128x615.612,820.690.21331.22539.245,409.42100.0%
2,04851216x1,108.345,035.580.12615.79791.136,194.76100.0%
2,04851232x1,707.527,268.210.081,311.942,500.458,585.35100.0%
2,04851264x2,680.6511,095.930.052,154.183,731.0811,229.20100.0%
2,048512128x3,635.1814,639.450.045,392.668,115.9416,812.97100.0%
2,048512256x3,991.2716,309.520.0315,529.5919,407.4830,204.02100.0%
2,048512512x4,111.0616,611.000.0329,179.7937,899.3756,834.72100.0%
2,0485121024x2,262.899,066.700.0556,603.8867,766.6088,148.7243.9%
2,0481,0241x96.70338.030.80376.84376.845,809.59100.0%
2,0481,0242x197.95557.710.56244.50390.767,008.72100.0%
2,0481,0244x262.10918.880.37287.55453.588,200.36100.0%
2,0481,0248x513.821,857.360.22536.46546.668,343.90100.0%
2,0481,02416x858.672,885.700.171,013.281,668.8310,590.21100.0%
2,0481,02432x1,481.934,642.870.121,476.312,439.4112,284.39100.0%
2,0481,02464x2,572.917,687.410.072,191.993,230.7415,540.07100.0%
2,0481,024128x3,963.1011,236.200.054,484.307,289.9220,019.48100.0%
2,0481,024256x4,423.9212,709.320.0415,293.5120,321.3136,552.62100.0%
2,0481,024512x4,788.7413,724.900.0433,951.3041,283.0068,820.17100.0%
2,0481,0241024x5,209.4314,644.250.0358,123.9774,792.74119,539.8794.4%
2,0482,0481x95.13392.500.72375.56375.565,002.71100.0%
2,0482,0482x181.63518.040.61409.22412.227,501.32100.0%
2,0482,0484x291.45914.600.40374.17463.968,416.07100.0%
2,0482,0488x488.081,776.620.24526.13859.958,787.59100.0%
2,0482,04816x767.462,613.720.19912.311,616.7011,577.80100.0%
2,0482,04832x1,369.484,178.430.131,563.323,185.9312,690.26100.0%
2,0482,04864x2,596.527,826.340.072,515.313,741.2214,364.60100.0%
2,0482,048128x3,435.829,634.730.065,727.218,028.9821,976.18100.0%
2,0482,048256x3,588.109,930.580.0518,228.9722,170.6340,607.47100.0%
2,0482,048512x4,329.1312,178.620.0432,496.2940,232.3968,335.42100.0%
2,0482,0481024x6,245.0917,179.120.0316,907.0725,915.4989,642.90100.0%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H20
GPU Count8
GPU Memory (Total)760 GB
GPU Driver570.195.03
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)500 W
CPU ModelIntel(R) Xeon(R) Platinum 8469C
RAM1,007 GB

Software Configuration

Inference FrameworkvLLM
Framework Versionv0.9.0
OSUbuntu
OS Version24.04.3 LTS (Noble Numbat)
Kernel Version6.8.0-79-generic
Python Version3.12.3

Model Configuration

Providermistralai
Model Namemistral-nemo-instruct-2407
QuantizationBF16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length32768
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization85.00
Temperature0.70
Top-P1.00
Top-K-1