NVIDIA A100-PCIE-40GB (1x) - Mistral-Nemo-Instruct

November 20, 2025 at 08:26 PM

Dataset: reference (v1.0)

Best Performance

Click a metric to highlight the best run in the table below

Best Output TPS
3,541.62
Peak generation speed
Best Input TPS
6,567.89
Peak prefill speed
Best Energy Efficiency
0.01 kWh/MT
Energy cost per 1M tokens
Best TTFT (P95)
42.95 ms
Lowest latency
Best E2E (P95)
2,738.58 ms
Lowest latency

Test Matrix Results

Performance across different input/output token combinations and concurrency levels

Input TokensOutput TokensConcurrencyOutput TPSInput TPSEnergy Cost
(kWh/MT)
TTFT MeanTTFT P95E2E P95Success Rate
Best Run for Output TPS
128128256x3,541.623,548.860.01605.771,810.478,568.62100.0%
1281281x46.7344.540.5675.0075.002,738.58100.0%
1281282x92.1990.030.2851.2951.942,775.82100.0%
1281284x178.34176.940.1481.7694.462,865.33100.0%
1281288x350.68351.710.07101.09116.742,912.95100.0%
12812816x663.86668.720.04135.49183.773,063.16100.0%
12812832x1,176.001,177.720.02160.06301.483,428.62100.0%
12812864x2,005.391,997.550.02224.94490.623,974.29100.0%
128128128x2,863.372,865.470.01346.19951.615,445.18100.0%
128128512x3,450.703,457.020.014,327.009,960.6517,161.11100.0%
1285121x47.1311.231.1752.2752.2710,862.78100.0%
1285122x89.6923.680.5750.1350.9610,502.51100.0%
1285124x171.9345.160.3080.7994.7311,241.86100.0%
1285128x308.3989.370.16106.22118.9811,484.52100.0%
12851216x644.96170.300.08114.85174.1712,090.50100.0%
12851232x1,177.29309.070.04150.19280.0913,218.55100.0%
12851264x2,036.17535.320.03219.02477.0815,117.42100.0%
128512128x3,038.08791.450.02351.55960.7520,413.65100.0%
128512256x2,733.09707.860.02622.711,935.1945,043.69100.0%
1281,0241x46.8910.591.2042.9542.9511,493.46100.0%
1281,0242x77.7219.870.6659.6070.5312,359.66100.0%
1281,0244x155.2828.770.3669.3586.5317,486.28100.0%
1281,0248x244.9957.360.2295.15114.9317,492.53100.0%
1281,02416x466.0187.060.12125.71161.2522,701.34100.0%
1281,02432x908.55159.970.06167.27292.7225,569.64100.0%
1281,02464x1,583.62281.570.04220.18488.6628,831.92100.0%
1281,024128x2,459.47439.480.02360.01987.7336,949.98100.0%
1281,024256x2,231.10397.610.031,065.072,420.9676,877.80100.0%
1282,0481x46.8814.051.1646.6446.648,658.62100.0%
1282,0482x87.8721.450.5953.9254.6511,576.83100.0%
1282,0484x147.9025.750.3886.3298.6219,204.90100.0%
1282,0488x235.8051.760.23117.40133.7319,379.29100.0%
1282,04816x434.4786.490.13160.48185.3822,383.38100.0%
1282,04832x781.03141.630.07599.81643.1825,260.16100.0%
1282,04864x1,243.74212.630.05226.30490.9830,838.98100.0%
1282,048128x1,991.15333.150.03355.55953.0441,929.82100.0%
1282,048256x2,037.39360.650.031,145.552,399.9776,531.50100.0%
1282,048512x1,976.42335.770.0343,305.37117,534.39164,155.2297.3%
5121281x46.01178.290.2480.7180.712,782.02100.0%
5121282x90.30350.970.1282.6783.432,834.17100.0%
5121284x172.68667.120.06123.39160.482,960.13100.0%
5121288x320.501,236.310.04265.89303.323,191.14100.0%
51212816x570.952,212.430.02333.34520.463,563.45100.0%
51212832x916.543,571.490.01380.251,038.124,388.89100.0%
51212864x1,305.715,079.690.01791.012,185.666,082.95100.0%
512128128x1,686.446,567.890.011,214.284,012.679,275.80100.0%
512128256x1,498.105,835.270.015,319.5615,460.5120,961.25100.0%
5125121x46.6545.190.7380.9980.9910,975.00100.0%
5125122x91.2688.670.3581.7682.9211,218.38100.0%
5125124x177.76171.690.19122.18158.5011,515.16100.0%
5125128x334.67330.820.10264.01300.1311,934.49100.0%
51251216x585.85616.290.05319.32525.4812,852.25100.0%
51251232x1,039.881,076.920.03398.021,035.2514,734.45100.0%
51251264x1,709.611,734.700.02648.431,992.5918,188.10100.0%
512512128x1,587.851,615.220.021,226.564,002.5938,133.13100.0%
5121,0241x46.6341.900.7881.8481.8411,816.75100.0%
5121,0242x73.7148.700.5584.4385.2320,005.93100.0%
5121,0244x144.8090.060.29114.94175.1821,145.87100.0%
5121,0248x250.58182.250.16251.72288.7021,056.84100.0%
5121,02416x467.46370.720.08342.37535.1720,593.62100.0%
5121,02432x804.22585.400.05393.341,026.9225,274.94100.0%
5121,02464x1,356.65993.480.03642.382,018.0331,684.68100.0%
5121,024128x1,490.851,017.930.031,235.854,012.1359,921.59100.0%
5121,024256x1,508.521,063.220.0329,329.4084,725.74114,269.19100.0%
5121,024512x625.20455.260.0423,134.9083,717.80115,130.8943.2%

Hardware Configuration

GPU ManufacturerNVIDIA
GPU ModelNVIDIA A100-PCIE-40GB
GPU Count1
GPU Memory (Total)40 GB
GPU Driver570.148.08
CUDA VersionUnknown
Compute Capability8.0
Power Limit (per GPU)250 W
CPU ModelAMD EPYC 7H12 64-Core Processor
RAM221 GB

Software Configuration

Inference FrameworkvLLM
Framework Version0.10.0
OSUbuntu
OS Version22.04.5 LTS (Jammy Jellyfish)
Kernel Version6.8.0-60-generic
Python Version3.10.12

Model Configuration

Providermistral
Model NameMistral-Nemo-Instruct
QuantizationFP16

Inference Configuration

Runtime parameters used across all benchmark runs

Max Model Length8192
Tensor Parallel Size1
Pipeline Parallel Size1
GPU Memory Utilization0.90
Temperature0.70
Top-P1.00
Top-K-1