Peter Zhang
                                     Jun 04, 2025 18:17
                                
NVIDIA outlines the process to replicate MLPerf v5.0 training scores for LLM benchmarks, emphasizing hardware prerequisites and step-by-step execution.
                                
                                    
                                
                            
NVIDIA has detailed the process for reproducing training scores from the MLPerf v5.0 benchmarks, specifically focusing on Llama 2 70B LoRA fine-tuning and Llama 3.1 405B pretraining. This initiative follows NVIDIA’s previous announcement of achieving up to 2.6x higher performance in MLPerf Training v5.0, as reported by Sukru Burc Eryilmaz on the NVIDIA blog. The benchmarks are part of MLPerf’s comprehensive evaluation suite aimed at measuring the performance of machine learning models.
Prerequisites for Benchmarking
To run these benchmarks, specific hardware and software requirements must be met. For Llama 2 70B LoRA, an NVIDIA DGX B200 or GB200 NVL72 system is necessary, while the Llama 3.1 405B requires at least four GB200 NVL72 systems connected via InfiniBand. Additionally, substantial disk space is required: 2.5 TB for Llama 3.1 and 300 GB for LoRA fine-tuning.
Cluster and Environment Setup
NVIDIA utilizes a cluster setup managed by the NVIDIA Base Command Manager (BCM), which requires an environment based on Slurm, Pyxis, and Enroot. Fast local storage configured in RAID0 is recommended to minimize data bottlenecks. Networking should incorporate NVIDIA NVLink and InfiniBand for optimal performance.
Executing the Benchmarks
The execution process involves several steps, starting with building a Docker container and downloading necessary datasets and checkpoints. The benchmarks are run using SLURM, with a configuration file detailing hyperparameters and system settings. The process is designed to be flexible, allowing for adjustments based on different system sizes and requirements.
Analyzing Benchmark Logs
During the benchmarking process, logs are generated that include key MLPerf markers. These logs provide insights into initialization, training progress, and final accuracy. The ultimate goal is to achieve a target evaluation loss, which signals the successful completion of the benchmark.
For more detailed instructions, including specific scripts and configuration examples, refer to the NVIDIA blog.
Image source: Shutterstock
                            
                            
 
				 
												




