For the complete documentation index, see llms.txt. This page is also available as Markdown.

What is the optimal encoding and sample rate to send to Deepgram?

Deepgram transcodes audio from one format to another during pre-processing and post-processing.

If you have control over your audio format, we recommend sending audio encoded as linear16 with a sample rate of 8000. You can specify these parameters in the /speak endpoint and the streaming /listen endpoint by setting encoding=linear16&sample_rate=8000.

Below is an example bash script that tests how long it takes for the /speak endpoint to produce audio for two different encodings and two different sample rates. You can extend/modify this script for testing if desired.

declare -a voices=(
"aura-asteria-en"
)

declare -a encodings=(
"mulaw"
"linear16"
)

declare -a sample_rates=(
"8000"
"16000"
)

declare -a sentences=(
"Hello welcome to deepgram, how can I help you today? This is a much longer piece of text that should take longer to process. There are many extra words that will increase the processing time. Here is another sentence that will add to the duration of audio"
)

for voice in "${voices[@]}"
  do
    for encoding in "${encodings[@]}"
      do
        for sample_rate in "${sample_rates[@]}"
          do
            for sentence in "${sentences[@]}"
              do
                echo "https://api.deepgram.com/v1/speak?model=$voice&encoding=$encoding&sample_rate=$sample_rate"
                time curl -H "Authorization: Token $DEEPGRAM_API_KEY" -X POST --header "Content-Type: application/json" -d "{\"text\":\"$sentence\"}" "https://api.deepgram.com/v1/speak?model=$voice&encoding=$encoding&sample_rate=$sample_rate" > "$voice-$encoding-$sample_rate.mp3"
            done
        done
    done
done

Last updated