NVIDIA H200 NVL (2x) - allam-7b-instruct-preview

November 6, 2025 at 07:07 AM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
11,481.64
Peak generation speed
Best Input TPS
45,184.12
Peak prefill speed
Best Energy Efficiency
0.01 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
45.78 ms
Lowest latency
Best E2E (P95)
821.66 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
1285121024x11,481.643,305.380.021,004.272,648.0711,446.45100.0%
1281281x79.6075.870.40846.69846.691,607.79100.0%
1281282x163.06159.240.22774.05775.741,565.28100.0%
1281284x582.48577.930.0686.34112.44871.50100.0%
1281288x1,015.871,018.850.06132.10173.28990.56100.0%
12812816x1,935.731,949.910.03151.22203.531,029.41100.0%
12812832x3,429.653,435.510.02163.17205.181,141.07100.0%
12812864x4,186.574,487.620.02156.54237.841,246.44100.0%
128128128x5,083.066,046.140.02240.47451.221,464.58100.0%
128128256x6,132.067,106.950.01253.04464.981,598.83100.0%
128128512x5,560.857,362.310.02441.001,591.193,286.10100.0%
1285121x170.7840.690.5546.7546.752,997.98100.0%
1285122x324.9779.420.3653.4859.093,142.63100.0%
1285124x626.87155.490.2173.7385.143,256.42100.0%
1285128x838.20225.470.1390.45102.634,548.67100.0%
12851216x1,481.67400.190.07128.75160.565,106.86100.0%
12851232x3,926.311,000.980.03150.17178.764,032.41100.0%
12851264x6,246.531,642.300.02142.02196.064,519.35100.0%
128512128x7,847.362,062.850.02202.18411.535,401.76100.0%
128512256x9,466.912,555.820.02288.83620.586,642.71100.0%
128512512x10,702.512,900.870.02955.302,788.3412,366.08100.0%
1281,0241x157.0419.630.6256.2456.246,205.59100.0%
1281,0242x254.2839.610.4851.8355.966,172.50100.0%
1281,0244x471.4274.650.2193.8798.566,659.77100.0%
1281,0248x711.82151.540.18114.24120.206,652.21100.0%
1281,02416x1,596.77284.790.09161.91193.957,184.86100.0%
1281,02432x3,066.49520.490.05165.15191.627,813.03100.0%
1281,02464x5,070.11862.220.04152.46208.369,324.32100.0%
1281,024128x6,990.341,172.110.03189.01414.6011,146.71100.0%
1281,024256x8,977.951,530.610.02304.19722.3314,216.69100.0%
1281,024512x10,492.761,800.590.02627.961,444.9818,565.60100.0%
1281,0241024x11,322.901,922.740.021,058.052,398.3727,024.78100.0%
1282,0481x172.0023.820.5957.3857.385,115.29100.0%
1282,0482x323.6259.230.2863.7065.084,198.09100.0%
1282,0484x508.7879.620.2862.1465.956,294.49100.0%
1282,0488x719.65145.490.17110.68138.706,272.11100.0%
1282,04816x1,309.22250.850.10142.71188.307,097.18100.0%
1282,04832x2,289.52369.280.07149.28173.289,360.21100.0%
1282,04864x4,110.64614.980.04153.94195.6212,638.20100.0%
1282,048128x4,651.90740.460.04161.18272.4814,255.68100.0%
1282,048256x7,274.341,153.570.03561.302,724.3720,094.99100.0%
1282,048512x9,036.251,429.960.02601.071,791.1724,934.97100.0%
1282,0481024x10,987.181,722.330.02817.882,594.7932,590.06100.0%
5121281x155.72603.410.0852.4152.41821.66100.0%
5121282x298.721,161.030.0569.2689.05851.64100.0%
5121284x568.892,197.780.0577.5287.29892.11100.0%
5121288x1,006.883,883.970.03110.38138.801,000.47100.0%
51212816x1,727.676,949.210.02154.12181.201,121.85100.0%
51212832x2,960.2911,736.030.01224.54287.161,318.03100.0%
51212864x4,544.2817,864.350.01213.80366.251,678.86100.0%
512128128x5,363.5421,607.600.01184.10292.072,112.03100.0%
512128256x6,250.3025,110.060.01223.64364.762,590.08100.0%
512128512x6,620.1527,143.500.01355.70907.234,345.16100.0%
5121281024x7,549.5730,444.450.01380.31728.154,002.21100.0%
5125121x168.53163.270.3246.9746.973,036.84100.0%
5125122x330.00320.660.2445.4945.783,097.91100.0%
5125124x547.47601.950.1485.55105.743,271.22100.0%
5125128x1,032.631,111.110.08106.11129.133,544.04100.0%
51251216x1,569.471,937.970.05171.14219.954,070.94100.0%
51251232x3,103.383,306.610.03279.22382.964,783.49100.0%
51251264x5,451.405,736.140.02216.57257.445,435.62100.0%
512512128x7,884.268,034.330.02212.82315.427,677.98100.0%
512512256x7,171.227,376.750.02603.012,233.3511,673.92100.0%
5121,0241x164.05235.850.2849.8749.872,096.81100.0%
5121,0242x200.94161.370.3650.2253.765,925.69100.0%
5121,0244x438.73313.170.1697.99102.346,043.87100.0%
5121,0248x932.04766.990.10102.81125.034,819.54100.0%
5121,02416x1,141.95933.320.08125.09146.267,673.13100.0%
5121,02432x2,441.571,911.040.04153.50179.557,647.95100.0%
5121,02464x3,992.893,105.330.03145.64222.949,657.40100.0%
5121,024128x6,277.754,495.480.02171.41323.4912,502.55100.0%
5121,024256x7,622.535,710.670.02261.27624.8517,369.64100.0%
5121,024512x8,428.066,048.080.02498.521,507.7125,677.13100.0%
5121,0241024x9,095.476,563.470.022,342.2211,292.1645,640.48100.0%
5122,0481x164.69210.530.3062.0962.092,348.53100.0%
5122,0482x264.59238.900.2177.5482.404,100.13100.0%
5122,0484x508.93371.870.17118.17150.285,253.40100.0%
5122,0488x811.87565.090.11149.87166.256,699.02100.0%
5122,04816x1,334.481,135.820.07181.74232.676,752.33100.0%
5122,04832x1,509.431,041.230.06224.30293.2110,139.93100.0%
5122,04864x2,993.932,172.610.04243.75334.3811,410.53100.0%
5122,048128x4,225.962,895.820.03186.96281.8815,446.53100.0%
5122,048256x6,084.624,188.590.02259.27574.5419,721.13100.0%
5122,048512x7,602.995,094.060.02440.741,049.7030,827.68100.0%
5122,0481024x8,551.505,744.010.023,175.2714,443.0953,117.24100.0%
1,0241281x145.121,105.440.0591.9991.99881.63100.0%
1,0241282x286.352,185.680.0365.5880.90888.68100.0%
1,0241284x527.844,029.900.03109.51136.10968.02100.0%
1,0241288x948.157,252.780.02123.77142.191,068.95100.0%
1,02412816x1,586.3712,169.640.01218.64256.461,275.52100.0%
1,02412832x2,419.8718,584.860.01274.05400.641,636.36100.0%
1,02412864x3,429.0526,408.060.01349.74549.532,285.79100.0%
1,024128128x4,592.2035,806.490.01390.02691.843,308.66100.0%
1,024128256x4,934.4839,160.750.01341.97800.284,482.06100.0%
1,024128512x5,030.1839,368.600.01672.071,936.579,145.26100.0%
1,0241281024x4,802.1937,235.420.017,364.7015,463.5520,710.74100.0%
1,0245121x157.30346.980.2079.7079.702,803.20100.0%
1,0245122x308.62591.230.1773.7780.033,290.69100.0%
1,0245124x590.031,126.190.1086.89105.903,457.94100.0%
1,0245128x1,021.141,971.560.05150.01186.473,961.01100.0%
1,02451216x1,831.023,511.620.03201.14274.454,456.23100.0%
1,02451232x2,962.475,867.630.02292.71384.635,308.54100.0%
1,02451264x4,339.538,796.530.02344.01521.887,056.72100.0%
1,024512128x5,659.8311,580.670.02375.92771.4610,641.76100.0%
1,024512256x6,978.5914,405.350.01305.21738.1315,509.89100.0%
1,024512512x6,878.8513,983.360.011,020.904,475.2722,512.72100.0%
1,0241,0241x150.35234.540.3273.0473.044,148.88100.0%
1,0241,0242x285.77379.860.1778.0492.465,128.08100.0%
1,0241,0244x473.62597.800.16114.54151.066,507.83100.0%
1,0241,0248x899.021,448.680.08122.45145.185,381.90100.0%
1,0241,02416x1,308.351,902.060.06202.23250.807,549.92100.0%
1,0241,02432x2,351.073,240.230.04262.16377.199,614.02100.0%
1,0241,02464x3,673.865,470.860.03318.01488.2711,108.98100.0%
1,0241,024128x4,989.307,357.270.02412.30766.8116,808.61100.0%
1,0241,024256x6,212.139,145.480.02377.66826.2525,940.52100.0%
1,0241,024512x6,484.069,546.370.024,654.4829,567.4147,767.38100.0%
1,0241,0241024x6,294.799,255.380.0228,559.1768,209.1193,701.66100.0%
1,0242,0481x167.69259.930.2783.3383.333,743.82100.0%
1,0242,0482x294.70386.470.2386.0894.135,008.42100.0%
1,0242,0484x545.17874.110.1194.09107.514,397.39100.0%
1,0242,0488x907.131,332.370.08141.04161.995,653.59100.0%
1,0242,04816x1,063.121,437.290.07212.43287.749,531.88100.0%
1,0242,04832x2,178.972,906.680.04275.74434.449,793.31100.0%
1,0242,04864x2,752.693,598.630.04366.61565.0112,822.68100.0%
1,0242,048128x3,642.885,340.350.03363.41790.2617,237.66100.0%
1,0242,048256x5,107.527,182.940.02526.401,249.0827,332.66100.0%
1,0242,048512x5,634.467,851.740.024,843.0631,115.9053,216.70100.0%
1,0242,0481024x6,187.188,465.400.0231,226.5375,435.23100,768.48100.0%
2,0481281x144.472,221.220.03108.97108.97886.12100.0%
2,0481282x262.304,015.370.0284.44106.71964.85100.0%
2,0481284x458.787,024.190.02138.24168.411,109.33100.0%
2,0481288x842.1112,871.710.01180.00244.161,195.83100.0%
2,04812816x1,260.9520,458.470.01301.50409.771,495.79100.0%
2,04812832x1,821.5328,781.970.01421.28624.372,137.44100.0%
2,04812864x2,439.1437,798.550.01614.121,095.263,226.38100.0%
2,048128128x2,917.1545,184.120.01897.101,908.005,278.90100.0%
2,048128256x2,841.3744,139.330.011,904.897,665.579,992.67100.0%
2,0485121x161.03627.550.14102.93102.933,129.51100.0%
2,0485122x224.181,208.450.08115.43117.383,152.09100.0%
2,0485124x511.122,126.700.06132.46167.003,633.86100.0%
2,0485128x785.783,746.290.04196.55289.564,162.25100.0%
2,04851216x1,366.736,856.860.02317.07508.304,546.62100.0%
2,04851232x2,289.7210,133.120.02421.14651.086,104.96100.0%
2,04851264x3,253.4013,628.550.02475.48906.229,064.97100.0%
2,048512128x4,004.6816,749.410.02903.971,899.8714,727.57100.0%
2,048512256x3,762.5915,831.120.013,521.6822,285.2930,216.11100.0%
2,0481,0241x154.99770.250.13107.55107.552,548.35100.0%
2,0481,0242x291.46851.770.1476.4396.114,559.88100.0%
2,0481,0244x451.451,349.690.09122.16194.465,770.78100.0%
2,0481,0248x698.232,765.370.05178.65244.085,378.37100.0%
2,0481,02416x976.403,432.950.04261.07369.038,101.61100.0%
2,0481,02432x1,764.455,786.740.03435.28757.7310,115.20100.0%
2,0481,02464x2,386.727,919.950.02650.421,389.1415,595.77100.0%
2,0481,024128x3,604.3210,737.820.02851.361,876.0522,798.14100.0%
2,0481,024256x3,988.1412,243.000.023,713.0522,964.5237,260.50100.0%
2,0481,024512x3,925.9312,408.700.0222,465.7854,663.0773,676.78100.0%
2,0482,0481x146.54396.140.20109.57109.574,959.86100.0%
2,0482,0482x229.77961.010.1298.33102.143,972.07100.0%
2,0482,0484x324.451,016.960.1189.44103.045,548.8575.0%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA H200 NVL
GPU Count2
GPU Memory (Total)280 GB
GPU Driver580.95.05
CUDA VersionUnknown
Compute Capability9.0
Power Limit (per GPU)600 W
CPU ModelIntel(R) Xeon(R) 6960P
RAM2,267 GB

Software Configuration

Inference FrameworkvLLM
Framework Version0.11.0
OSUbuntu
OS Version22.04.5 LTS (Jammy Jellyfish)
Kernel Version5.15.0-88-generic
Python Version3.10.12

Model Configuration

Providerhumain-ai
Model Nameallam-7b-instruct-preview
QuantizationFP16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length4096
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization0.90
Temperature0.70
Top-P1.00
Top-K-1