Select Page



Terrill Dicki
Jul 20, 2024 11:23

Learn to use Claude 3 models with audio data in Python, leveraging AssemblyAI’s LeMUR framework for seamless integration.





Claude 3.5 Sonnet, recently announced by Anthropic, sets new industry benchmarks for various LLM tasks. This model excels in complex coding, nuanced literary analysis, and showcases exceptional context awareness and creativity.

According to AssemblyAI, users can now learn how to utilize Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku with audio or video files in Python.

claude3_lemur_pipeline.png
Pipeline for applying Claude 3 models to audio data

Here are a few example use cases for this pipeline:

  • Creating summaries of long podcasts or YouTube videos
  • Asking questions about the audio content
  • Generating action items from meetings

How Does It Work?

Language models primarily work with text data, necessitating the transcription of audio data first. Multimodal models can address this, though they remain in early development stages.

To achieve this, AssemblyAI’s LeMUR framework is employed. LeMUR simplifies the process by allowing the combination of industry-leading Speech AI models and LLMs in just a few lines of code.

Set Up the SDK

To get started, install the AssemblyAI Python SDK, which includes all LeMUR functionality.

pip install assemblyai

Then, import the package and set your API key. You can get one for free here.

import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"

Transcribe an Audio or Video File

Next, transcribe an audio or video file by setting up a Transcriber and calling the transcribe() function. You can pass in any local file or publicly accessible URL. For instance, a podcast episode of Lenny’s podcast featuring Dalton Caldwell from Y Combinator can be used.

audio_url = "https://storage.googleapis.com/aai-web-samples/lennyspodcast-daltoncaldwell-ycstartups.m4a"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_url)

print(transcript.text)

Use Claude 3.5 Sonnet with Audio Data

Claude 3.5 Sonnet is Anthropic’s most advanced model to date, outperforming Claude 3 Opus on a wide range of evaluations while remaining cost-effective.

To use Sonnet 3.5, call transcript.lemur.task(), a flexible endpoint that allows you to specify any prompt. It automatically adds the transcript as additional context for the model.

Specify aai.LemurModel.claude3_5_sonnet for the model when calling the LLM. Here’s an example of a simple summarization prompt:

prompt = "Provide a brief summary of the transcript."

result = transcript.lemur.task(
    prompt, final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)

Use Claude 3 Opus with Audio Data

Claude 3 Opus is adept at handling complex analysis, longer tasks with many steps, and higher-order math and coding tasks.

To use Opus, specify aai.LemurModel.claude3_opus for the model when calling the LLM. Here’s an example of a prompt to extract specific information from the transcript:

prompt = "Extract all advice Dalton gives in this podcast episode. Use bullet points."

result = transcript.lemur.task(
    prompt, final_model=aai.LemurModel.claude3_opus
)

print(result.response)

Use Claude 3 Haiku with Audio Data

Claude 3 Haiku is the fastest and most cost-effective model, ideal for executing lightweight actions.

To use Haiku, specify aai.LemurModel.claude3_haiku for the model when calling the LLM. Here’s an example of a simple prompt to ask your questions:

prompt = "What are tar pit ideas?"

result = transcript.lemur.task(
    prompt, final_model=aai.LemurModel.claude3_haiku
)

print(result.response)

Learn More About Prompt Engineering

Applying Claude 3 models to audio data with AssemblyAI and the LeMUR framework is straightforward. To maximize the benefits of LeMUR and the Claude 3 models, refer to additional resources provided by AssemblyAI.

Image source: Shutterstock


Share it on social networks