AMD Introduces AMD-135M: A Breakthrough in Small Language Models

Luisa Crawford
Sep 28, 2024 07:13

AMD has unveiled its first small language model, AMD-135M, with Speculative Decoding, enhancing AI model efficiency and performance.

In a significant development within the artificial intelligence sector, AMD has announced the release of its first small language model (SLM), AMD-135M. This new model aims to offer specialized capabilities while addressing some of the limitations faced by large language models (LLMs) such as GPT-4 and Llama, according to AMD.com.

AMD-135M: First AMD Small Language Model

The AMD-135M, part of the Llama family, is AMD’s pioneering effort in the SLM arena. The model was trained from scratch using AMD Instinct™ MI250 accelerators and 670 billion tokens. The training process resulted in two distinct models: AMD-Llama-135M and AMD-Llama-135M-code. The former underwent pretraining with general data, while the latter was fine-tuned with an additional 20 billion tokens specifically for code data.

Pretraining: AMD-Llama-135M was trained over six days using four MI250 nodes. The code-focused variant, AMD-Llama-135M-code, required an additional four days for fine-tuning.

All associated training code, datasets, and model weights are open-sourced, enabling developers to reproduce the model and contribute to the training of other SLMs and LLMs.

Optimization with Speculative Decoding

One of the notable advancements in AMD-135M is the use of speculative decoding. Traditional autoregressive approaches in large language models often suffer from low memory access efficiency, as each forward pass generates only a single token. Speculative decoding addresses this by employing a small draft model to generate candidate tokens, which are then verified by a larger target model. This method allows multiple tokens to be generated per forward pass, significantly improving memory access efficiency and inference speed.

Inference Performance Acceleration

AMD has tested the performance of AMD-Llama-135M-code as a draft model for CodeLlama-7b on various hardware configurations, including the MI250 accelerator and the Ryzen™ AI processor. The results indicated a considerable speedup in inference performance when speculative decoding was employed. This enhancement establishes an end-to-end workflow for training and inferencing on selected AMD platforms.

Next Steps

By providing an open-source reference implementation, AMD aims to foster innovation within the AI community. The company encourages developers to explore and contribute to this new frontier in AI technology.

For more details on AMD-135M, visit the full technical blog on AMD.com.

Image source: Shutterstock

Share it on social networks