Alvin Lang
                                     Jun 04, 2025 15:44
                                
NVIDIA’s Multi-Process Service optimizes GPU usage in molecular dynamics simulations, boosting throughput by running concurrent processes on a single GPU.
                                
                                    
                                
                            
Molecular dynamics (MD) simulations, essential for modeling atomic interactions over time, demand substantial computational resources. Despite this, many simulations involve small system sizes, often underutilizing modern GPUs. NVIDIA’s Multi-Process Service (MPS) offers a solution by allowing multiple simulations to run concurrently on the same GPU, thereby maximizing GPU utilization and improving throughput, according to NVIDIA.
Understanding MPS
MPS is a binary-compatible implementation of the CUDA API that facilitates efficient GPU sharing by multiple processes. It reduces context-switching overhead and improves overall GPU utilization by allowing all processes to share scheduling resources. Since the NVIDIA Volta GPU generation, MPS also supports concurrent kernel execution from different processes, enhancing performance when individual processes can’t fully saturate the GPU. Notably, MPS can be initiated with regular user privileges, simplifying its deployment.
Implementing MPS with OpenMM
To leverage MPS in OpenMM, a popular MD engine, users can run multiple simulations simultaneously. This is done by launching several instances of a simulation script as separate processes. Although individual simulations may slow down, the overall throughput increases due to parallel execution. A simple command structure allows users to control GPU targeting and process management, enhancing resource allocation efficiency.
Benchmarking Performance
Benchmark tests reveal significant throughput improvements when applying MPS to systems of varying sizes. For instance, the DHFR system, with 23,000 atoms, benefits from a substantial performance uplift, particularly on high-end GPUs like the NVIDIA H100 Tensor Core. Even larger systems, such as the Cellulose benchmark with 409,000 atoms, experience a throughput increase of about 20%.
Optimizing Throughput with CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
By default, MPS allows full GPU resource access to all processes. However, setting the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE environment variable can further optimize throughput by limiting thread availability per process. This adjustment has shown to boost collective throughput significantly, especially in simulations involving multiple concurrent processes.
Application in Free Energy Calculations
MPS also proves advantageous in free energy perturbation (FEP) simulations, which rely on replica-exchange molecular dynamics. By running multiple simulations at different λ windows concurrently, MPS mitigates GPU underutilization, resulting in a 36% throughput increase when using three MPS processes on NVIDIA’s L40S or H100 GPUs.
Conclusion
NVIDIA’s MPS is a valuable tool for enhancing MD simulation throughput with minimal coding effort. By optimizing GPU resource utilization, MPS significantly boosts performance across various simulation scenarios. For those interested in exploring these capabilities further, NVIDIA provides additional resources and tutorials to support implementation and experimentation.
Image source: Shutterstock
                            
                            
 
				 
												




