Select Page



Joerg Hiller
Aug 29, 2024 07:18

NVIDIA’s Blackwell architecture sets new benchmarks in MLPerf Inference v4.1, showcasing significant performance improvements in LLM inference.





NVIDIA’s new Blackwell architecture has set unprecedented benchmarks in the latest MLPerf Inference v4.1, according to the NVIDIA Technical Blog. The platform, introduced at NVIDIA GTC 2024, features a superchip based on 208 billion transistors and employs the TSMC 4NP process tailored for NVIDIA, making it the largest GPU ever built.

NVIDIA Blackwell Shines in MLPerf Inference Debut

In its inaugural round of MLPerf Inference submissions, NVIDIA’s Blackwell architecture delivered remarkable results on the Llama 2 70B LLM benchmark, achieving up to 4x higher tokens per second per GPU compared to the previous H100 GPU. This performance leap was facilitated by the new second-generation Transformer Engine, which leverages Blackwell Tensor Core technology and TensorRT-LLM innovations.

According to the MLPerf results, Blackwell’s FP4 Transformer Engine managed to execute approximately 50% of the workload in FP4, reaching a delivered math throughput of 5.2 petaflops. The Blackwell-based submissions were in the closed division, meaning the models were unmodified yet met high accuracy standards.

NVIDIA H200 Tensor Core GPU’s Outstanding Performance

The NVIDIA H200 GPU, an upgrade to the Hopper architecture, also delivered exceptional results across all benchmarks. The H200, equipped with HBM3e memory, showed significant improvements in memory capacity and bandwidth, benefiting memory-sensitive applications.

For example, the H200 achieved notable performance gains on the Llama 2 70B benchmark, with a 14% improvement over the previous round, purely through software enhancements in TensorRT-LLM. Additionally, the H200’s performance surged by 12% when its thermal design power (TDP) was increased to 1,000 watts.

Jetson AGX Orin’s Giant Leap in Edge AI

NVIDIA’s Jetson AGX Orin demonstrated impressive performance improvements in generative AI at the edge, achieving up to 6.2x more throughput and 2.4x better latency on the GPT-J 6B parameter LLM benchmark. This was made possible through numerous software optimizations, including the use of INT4 Activation-aware Weight Quantization (AWQ) and in-flight batching.

The Jetson AGX Orin platform is uniquely positioned to run complex models like GPT-J, vision transformers, and Stable Diffusion at the edge, providing real-time, actionable insights from sensor data such as images and videos.

Conclusion

In summary, NVIDIA’s Blackwell architecture has set new standards in MLPerf Inference v4.1, achieving up to 4x the performance of its predecessor, the H100. The H200 GPU continues to deliver top-tier performance across multiple benchmarks, while Jetson AGX Orin showcases significant advancements in edge AI.

NVIDIA’s continuous innovation across the technology stack ensures it remains at the forefront of AI inference performance, from large-scale data centers to low-power edge devices.

Image source: Shutterstock


Share it on social networks